How to check if answer is correct or not

I was thinking of implementing my own wanikani client (no real reason, just wanted to try how would console one look like :slight_smile: ). The API seems to be reasonably clean and well structured, however I don’t see any endpoint to validate if provided answer counts as correct or not. When submitting the review, you are expected to set incorrect_meaning_answers yourself. But that’s it.

Do I assume that implementing the logic to check the string for correctness is expected to be done client side? If yes, is the algorithm the wanikani website uses documented anywhere? For reading it looks like exact match, but meaning seems to be something like levenshtein distance? Are details available anywhere?

1 Like

I believe the intention is to check for correctness on the client side, yeah. And “levenshtein distance” rings a bell, so it might be that exact algorithm. Even if it’s a different algorithm, if you use a similar “closeness” algorithm it would be fine anyway. If you search “levenshtein” on the forums you might find confirmation that it’s the right algorithm, but I don’t think there’s any official documentation saying that they use it.

They use a modified version of levenshtein. I just had a review so I checked the review code quickly.

The basic routine (which they call levenshteinDistance), looks a lot like Optimal String Alignment. I had to check it against a couple of the more commonly used distance measures to make sure, and this one seems the closest. It’s a simplified version of Damerau-Levenshtein that runs a bit more efficiently (never thought my data mining professor would be right with his “one day you’re going to use these terms”, never thought it would be on a forum for learning Japanese though).

Damerau-Levenshtein is essentially the Levenshtein distance but with the added bonus that you may swap 2 adjacent letters in one step. So say you have the word CRAB and you want to go to CARB, for normal Levensthein you would have to go CRAB->CAB->CARB (there are a couple of alternatives, like using substitution instead, but the distance remains the same), which is distance 2. In Damerau-Levenshtein you can just swap the R and A out in 1 step, thus going CRAB->CARB with distance 1. It’s a bit more lenient on typos, since most typos are simply you hitting keys on the keyboard too fast, thus letters tend to be swapped a lot more often.

Downside of Damerau is that it’s a lot more computationally intensive than Levenshtein (taking swaps into account does make it more complex). Optimal String Alignment is a modification which makes it slightly more efficient by placing a restriction: you can only modify a substring once.

Basically, say you wanted to go from CRAB to ROCAB (I couldn’t think of any real word here, oh well), in Damerau you could simply go CRAB->RCAB->ROCAB, which yields distance 2. For optimal string length, you can’t insert O between R and C after already having swapped them, since this would violate the single-modification rule. So instead you have to go CRAB->ORAB->RORAB->ROCAB which is distance 3. But for typo detection, it works well enough and it’s a lot faster.

And at this point I’ve realized I forgot to look up what the exact value cutoff is for them to mark it wrong, and I have no reviews coming up in the next few hours. Oh well, I assume it’s based on the length of the original strings in some way. I’d have to check tomorrow to find out. I’ve had my share of looking at minimized javascript for today.

9 Likes

It is based on the string length, though I don’t know the exact criteria. For example, I’m pretty sure no typos are allowed at all for 3 character words.

1 Like

I think we could probably just do a search in the code for the call to the levenshteinDistance and just look at whatever conditional it is used in to get the exact values. I imagine the actual check won’t be too far from the call. My next review is at 5AM though, and I’m not going to stay awake that long, so that’s a problem for another day. :grin:

Like tomorrow perhaps. :slight_smile:

1 Like

I ended up looking it up. The entire process for verifying an answer is pretty much what you would expect:

For meaning answers, it basically just checks the answer against the blacklist first, marking it wrong if it matches. Then it checks your answers against the correct answers and the user synonym list using the distance function (with an exception for digit-based answers, those seem to require an exact match, which is logical, otherwise any digit would get the answer to pass). The result of the distance function is compared against a function that determines the maximum distance, defined as:

\begin{align} 0 && \textrm{if } L \le 3 \\ 1 && \textrm{if } 4 \le L \le 5 \\ 2 && \textrm{if } 6 \le L \le 7 \\ 2 + \left \lfloor{L / 7}\right \rfloor && \textrm{if } L \ge 8 \end{align}

So basically, the longer the string the more mistakes you’re allowed to make.

7 Likes

This is awesome, thank you!

1 Like

2 years later but this saved me a lot of research, thanks!

3 Likes