Lapis app: a segmentation engine that understands grammar, the best dictionary lookups, SRS designed for language learning

Two workarounds for this (from the parser side of things) are 1) to utilize a names dictionary, and 2) utilize Wikipedia page names. The latter works well for uncommon fictional character names that have their own pages, as well =D

Here’s an example of output from Juman++ which includes Wikipedia as a source:

echo “でむかえた枝元さん” | jumanpp
でむかえた でむかえた でむかえる 動詞 2 * 0 母音動詞 1 タ形 10 “代表表記:出迎える/でむかえる”
枝元 枝元 枝元 名詞 6 人名 5 * 0 * 0 “自動獲得:Wikipedia 読み不明 Wikipedia人名 Wikipedia姓”
さん さん さん 接尾辞 14 名詞性名詞接尾辞 2 * 0 * 0 “代表表記:さん/さん”

5 Likes

I’ll mirror what everybody else says - nice work. I especially like the way parsing an example sentence opens a new thingy, but it’s all intuitive and smooth. I used it for a couple of hours today to help with reading, and it’s great to have the whole sentence there with all the bits laid out to help reconstruct it in your head. I haven’t used the sentence mining / srs yet, I will when I’ve got a clearer idea of what I’ll put in it.

With my testing hat on, I noticed a couple of things:

Some of the word definitions seem odd, e.g. I’ve only ever come across とき referring to time, never ibisis, but I got ibis, not time in the definition box, although clicking ‘lookup’ brings up what I’d expect.

有料になる前の去年3月に調べたときの2倍以上に増えました

image

It would be nice if when you drill down to the verbs if there was a bit more detail - OK, it’s the て form, but… what does that mean?

In the example sentences, the Japanese section is highlighted, but not the English, so you have try and work it out.

image

and sometimes I’m not sure what is selected at all.

image

3 Likes

Wanted to mention another issue I ran into. Does Lapis sometimes limit the possible meanings of a given word without showing you other possible meanings?

I have this example:

親子丼の具をよそう

Here Lapis believes よそう comes from よい, but it actually comes from 装う (to serve; usually written in かな). Given that I feed Lapis the full sentence, I hoped it’d pick up that this was actually a verb.

Still, there is also another very common word read as よそう, but Lapis is not listing it as an option (予想 - expectation).

2 Likes

This looks really cool! I haven’t tried it out yet, but I’ve been using ichi.moe and Jisho simultaneously for sentence parsing because they’re both stronger in different areas, but if this turns out better, that would be really great.

Edit: Just made an account; at first glance, the interface looks really nice.
Edit 2: I’ve been trying to make an SRS card, but after doing the selection process, for some reason, the save button is faded out. I also can’t add my already created cards saved in desks to SRS.

2 Likes

Here’s another issue I found:

その後輩 is being split as その後 + 輩. While I guess that’s a valid way of splitting, はい meaning (group / gang) is less common than the noun 後輩, yet it’s being given more precedence.

2 Likes

I agree. But in such a case I would just try to find another simple sentence that contains the word, or just not add it this time, and wait for a so called i+1 sentence. But I understand that isn’t always viable if you’re reading something above your level. So an option to split the entries into different cards with the context available is still a good idea as you said.

Thanks! Noted. In dark mode it’s harder to see the checkboxes the first time. I should use a simpler sentence in the tour, agreed. The current is a sentence that showcased lots of different things I wanted to show, but yeah it’s not an ideal first example :sweat_smile:. The tour is also a bit all over the place. I tried to make it cover a lot of things and that might backfire for new users. I should work on it more. A “Previous” button would be great but it’s a little bit hard to do because there are hooks on steps. I’ll look into it.

Will do that soon, but for now I just wanted to use a single thread as a start just so that people can track one place (a lot less people tend to join discord). Will add the channel later to discord for sure.

Thanks for all the feedback!

I see what you mean, makes sense. But the problem is this will be a bit harder to manage and evaluate than just using a mature solution like Patreon. But I’m not sure either way, I should think about it more. Thanks.

1 Like

Which platform and browser? If it’s Safari, can you try chrome?

Can you please try chrome if you can? I’m aware of a problem on Safari, but it’s a bit hard to resolve since it’s related to what we use as the auth server.

I forgot to talk about this. Right now we have kana matching for non-usually-kana disabled. This is because it’s just extremely hard to resolve a good match in this case, for obvious reasons, and by design we don’t return more than one segmentation result. There’s an exception for words that aren’t usually kana but are commonly written in Kana (we’ll be recognizing these words more with time). So, it’s a bit hard to make this work for children books, but I do have something planned in my backlog that might make this possible, so I’ll be revisiting this later. So in summary, even for beginners, right now the target material is still native level when it comes to kanji.

Actually I have this in my backlog, it’ll be relatively easy to solve for the names that don’t collide with vocabs. It’s just a matter of getting to it.

That’s a good idea.

This is one example of a word that isn’t usually-kana but commonly gets written in kana. Thanks, I’ll be adding that. (Segmentation avoids kana words that don’t have usually-kana unless if I myself allow it for certain words)

We haven’t gotten to writing short grammar docs for each construct, but we’ll be doing that, and maybe reference popular online docs.

It’s kinda impossible to do that :sweat_smile: There’s just no way we’ll be able to tell, especially that most times there’s just no literal equivalent. The translation is still to help you understand the overall meaning.

That’s because for some constructs they become hidden. In your example here, the construct you’re seeing examples of is the i-adj stem, which really doesn’t have a textual representation in the text, so we show where it actually is by highlighting a small space in its place. We’re looking into whether we can make this more clear.

It’s a matter of determining the correct precedence from context. It’s hard to get right in some cases. Here it resolved to a grammar instead and short circuited (which is why you don’t see the other options), which is wrong. I have a big list in my backlog of similar cases I want to look in to. I’ll add this sentence.
As for 予想, segmentation won’t pick this up in kana form for the reason listed above about children books.

Thanks, added to the list. Worth noting that it’s a bit hard resolving the more common one in such cases because I don’t do multiple results and compare overall scores (like some other engines do) to avoid some structural complexity with segmenting (because we resolve grammar too), but I have a solution for this in mind I want to work on soon.

Are you sure you selected the deck you want these saved to in the modal? If you haven’t the Save button will be faded out.

2 Likes

I had a look at it yesterday, looks interesting. Noticed some minor issues with scaling (I think), where picture is distorted or words overlaps (for example when changing JP_EN to JP-JP in box). I am using 14.0.2 Safari on Mac.

Question: is is possible to see a list of grammar points and go from there?

1 Like

I was using Chrome, however, it does work now. Not sure if there was just a delay in the activation or something? It’s hard to tell because there wasn’t any message on the login screen. Just refreshed and acted like I never even typed anything into the boxes. :man_shrugging:

Safari is uncomfortably outdated with supporting some modern css features (which pretty much every other modern browser supports), so we couldn’t test things on it. Would it be possible to try it on chrome and see if the same issues aren’t there?

Yes, planned soon :slightly_smiling_face:

Hmm, strange. I’ll have to check this.

Hi, on my second day using this wonderful new beta app:

Is there a way to override the segmentation where it got it obviously wrong?

Also, is it possible to replace a selected choice for a segment? I was surprised that the choice for まわす was 輪姦す (quite an unpleasant connotation) rather than the much more common (luckily) 回す.

Here is the example sentence I used.

夜勤が足りないせいでこのところ店長は夜勤にまわっており、昼の間は私と同世代のパートの女性の泉さんが社員のようになって、店をまわしている。

I know, this is a long sentence copied straight from a book, but I have so far found the segmentation super useful to help breaking down these long sentences, for me that works extremely well.

2 Likes

I checked on FF and it is fine. :slight_smile:

1 Like

I just registered an account, and have some suggestions for improvements at the login stage

  1. Notify the user of confirmation email once they register - I did not realize one was sent after creating an account. It immediately took me to the login screen so when I put in my new information and couldn’t login til I clicked the link in my email
  2. Wrong password/username or “check your email to activate your account” message

overall, excited for this and trying the onboarding. I have tried bunpro in the past but didn’t like it much and found the example sentences stiff and more focused on leveling/completion rather than retention.

3 Likes

Sorry about that. In this particular case this 回す wasn’t picked up for the reasons I talked about before concerning kana and usually-kana (if you want to know the details). Sometimes I will make exceptions and allow segmentation to pick such words up. I’ll add 回す. (This can’t be done randomly to any word because it would harm the accuracy on the long run)

No on the contrary, the aim is for it to work well even with such sentences. So just let me know if you find an unexpected result.

Right now, as we’re in beta, it’s expected to face lots of those, will improve with time.

I was sure we already do this. I’ll recheck. Should put some time into rounding up the corners around login and signup.

I have an idea about making example sentences available from parsing real novels or even anime. Do you think this would be more useful than stock tatoeba example sentences?

I tried last night:

Chrome and Brave (Chromium) on Android
Safari and Brave (WebKit) on iOS
Brave (Chromium) on Windows 10

Same issue presents. I’ll try regular Chrome and Firefox when I get a moment.

Tried Brave on windows 10 and worked fine. Weird. So you’re trying to login and it’s just reloading the login page?

Exactly that. Password is entered, click Login and page refreshes.

That was then. Now however it seems entirely borked at either the password entry or password reset stage. I’ve reset the password many times now and every time I get an incorrect user/password warning and a login failure, so I’m really not sure what the issue is here.

I’ve tried reset and login in most of the browsers mentioned above; Brave, Chrome, Edge, Firefox

Not to pull the ‘I know what I’m doing’ card - but I work in computing with a particular focus on cyber security… so I’m 90% sure I’m not making any stupid reset mistakes, hehe

1 Like

That does help hehe

I’m checking, and it seems you don’t have your email confirmed. I know the message wasn’t clear on signup so you might have missed the original “Confirm your email” email. Can you confirm this? Should add a way to resend the email too maybe. Or maybe resetting the password should confirm the email too.

I’m just too far into Anki to switch at this point however I just wanted to commend you for the incredible amount of good work that’s obivously been put into this.

Will most definately use it for the sentence de-construction feature, thank you for sharing, look forward to see it develop.

1 Like