While WK and Lapis has nothing in common (Lapis doesn’t solve the Kanji problem), Lapis and Bunpro do (grammar reference and SRS). So I don’t think I’m comfortable advertising for it there out of respect for Bunpro
Also, we chose to talk about it only in the WK community right now to limit the number of users as it’s still in beta. After we get a lot more confident with it we’ll be widening that area.
I had hard time with the available open-source deconjugation libraries, so I had to make my own. But yours provides more detailed information. The grammar details for “に対して” is impressive, but for other example sentences, “アメリカに留学していただけのことはある。” both the grammar lookup and the parser returns incomplete results.
Since there is no way to know every word, I wish I could look up words / segment text in J-J definitions - without interrupting the workflow -, that’s a deal breaker before I use this webpage.
I cannot import books on the site, but with the SRS tags, that’s ok for now. Well done.
I haven’t tried it out myself yet, but i do have a question. Since each sentence is segmented into words/grammar, does Lapis track which words and grammar you know? If so, does it have the capability of selecting/suggesting “i+1” sentences from a database (user created or not) for study?
If not, personally such a feature would take Lapis to a level far beyond any other software I’m currently aware of.
There’s an upcoming major update to the engine that will fix most of those issues. (It’s harder for some sentences because we detect all kinds of complex grammar patterns, so this leads to a lot of complexity on the engine layer.)
Because we want rich grammar detection and not just word segments, in this example sentence you have, it’s a matter of registering the だけのことはある grammar point and it’ll detect it correctly. We plan to focus on registering new grammar right after we finish with the segmentation engine update.
You can, if you select text and right click:
It’ll show you the result without leaving your current page. Does this solve what you have in mind?
There are thoughts around this subject (i+1 taking into account words and grammar the user knows). It’s definitely something we want to look into if the project lives on.
For when that discussion comes to the forefront, the use case I’m personally envisioning for such a feature is something along the lines of:
I provide Lapis with a transcript of an episode I just watched. Using all the words and grammar in sentences already in the SRS system above a certain level (and/or manually marked as “learned”), Lapis adds any sentences that have new words/grammar to some “To learn” queue. This queue has options for being sorted chronologically, by the +1 “unlocking” the most other i+1 sentences, etc.
Hopefully that makes sense, and thank you for your hard work!
Segmentation overhaul is now live, you’ll find that accuracy has been improved significantly. We’re very confident with the current state of the engine but we’re planning further improvements and few new things. We do accuracy comparisons with mecab (the best open source morphological analyzer), and we already have richer and more accurate results in many cases.
We also added some basic detection of names, to be improved upon later:
You can also open this from anywhere in the app without leaving the current page if you hit Ctrl+G.
Not all grammar docs are written, we’ll be filling these up with time.
This is a major step, but we still have more to do. Next step is registering more and more grammar (which will implicitly improve accuracy), and filling up the grammar documentation.
Have you compared mecab with Juman++? I’ve found better results with the latter, but it also comes with a large database (over 2 GB), as it’s using things like Wikipedia page titles. I know Juman++ also has a way to “train” it. I don’t know if mecab has the same, or if that feature would be of use in the backend of your project.
I’ve been waiting for this overhaul to go live before I dive into trying it out further. Looks like I get to start playing with it soon.
Later yes, there will be a way to submit those. But for now feel free to comment on this thread for any kind of feedback.
We’ll also open some discord channels for the different modules soon (link in the OP), but for now this thread is the preferred place.
Yeah basically all of these use trained models on a particular dic. Usually the default installation will get you better results with juman++ than mecab since mecab uses the outdated ipadic by default (but the most popular). The unidic one is the most up-to-date but it’s also huge. I haven’t played with juman++ much but I suspect the result is comparable to mecab+unidic (except for maybe wiki/names detection, but I’m not very interested in that).
Actually, the lapis engine uses a trained model (mecab like, with ipadic) in a very small step before the engine gets to actual work. The step involves producing word splits. This guides recognizing the most probable boundaries. We won’t gain much from using a better alternative though as that’s not where the actual work happens.
Hm. This is because I try to match names by their Kana. This won’t do. I’ll have to limit it to the Katakana names only, probably. Thanks, will push a fix soon.
I keep getting a message when I log in telling me my time zone is incorrect. Lapis provides me with a list of UTC+11 time zones to choose from, none of which are mine. I’m UTC+10 normally (I selected Sydney from the list in settings) but we are currently on summer time.
Thanks for the app! I’m still finding my way around the app, but I like what I see.
This is because the offset in your browser is a UTC+11. Usually this is because your computer is set to this. Can you confirm if the timezone in your computer is set correctly to UTC+10? The timezone set on your computer should match the one in Lapis, which is why we keep prompting.
This might be a problem from what gives us the current time zones and daylight saving times, I’ll double check. The reason we require this is that a mismatch will probably produce wrong daily schedules (since the schedule for the day has to know when your day starts and ends). I’ll confirm when I check.