[Userscript] ConfusionGuesser

Sinyaven · November 18, 2020, 1:03pm

Sorry that it took a while. I have now added a setting When answer incorrect (in the tab Interface). If you change it to Minimized ‒ show arrow, only an arrow will appear which you can click to show the list of guesses. And if you set it to Show nothing, not even the arrow is displayed (but you can still use the hotkey to open the list of guesses).

Version 1.12 patch-notes:

Guess list now also available for correct answers

Until now, ConfusionGuesser only reacted to incorrect answers. I have now slightly adapted the algorithm to also be able to show “guesses” after a correct answer. I’m not sure if this feature will be all that helpful, but it might be used to compare other items similar to the current one.

For correct meaning answers, I filter out all guesses where the characters are contained in the current item. Example: After answering 崇拝 with “worship”, the guess list will neither contain [崇(すう) Worship] nor [拝(はい) Worship]. The intention is to prevent spoiling the reading.

Settings for behavior after correct/incorrect answer

The tab Interface in the settings dialog now contains two new entries: When answer correct and When answer incorrect. The three available options are: Show nothing, Minimized ‒ show arrow and Show list. When Show nothing or Minimized ‒ show arrow is active, no guesses are computed automatically. So it might sometimes happen that you click the arrow but nothing happens, because ConfusionGuesser only starts the search after the click, but no guesses are found.

Please let me know if something doesn’t work.
Link to previous script version for downgrading in case version 1.12 doesn’t work for you

wojtek · November 18, 2020, 5:45pm

@Sinyaven seems to be working OK. thank you a lot!

Talos · November 29, 2020, 2:54pm

This script is fantastic - thank you so much @Sinyaven!

What would you think of a setting for the “When Answer Incorrect” dropdown, that amounts to “Show if similarity > X, otherwise minimize”, with X being a configurable value?

For example, right now, I seem to be triggering the script quite often based on things that have a really low similarity rating:

I’m concerned that too many false positives will eventually desensitize me to the “real” cases where I was confused. If I could say, for example, “Only pop up when similarity is >= 0.3” - I think it would help the overall effectiveness of the script as a learning aid!

Sinyaven · November 29, 2020, 7:31pm

Thanks, I’m glad it’s useful to you!

Regarding the request: I’m not sure – isn’t it still useful to know whether the reading you used actually exists for any kanji (in WK)?

Talos · November 30, 2020, 5:15pm

That’s an interesting point.

I think that, since Japanese is so full of homonyms, odds are pretty high that the reason my answer matches another kanji isn’t because I got the two kanji confused visually, but more out of sheer coincidence. Getting alerted in those cases, amounts to being given a random fact: “Hey, did you know that the reading you typed also happens to match another Kanji that looks nothing like this Kanji?”

That’s potentially interesting information, but it doesn’t exactly relate to the core reason why I love the 丸⬄九 feature of the script, which is helping to identify points of confusion between visually similar Kanji.

I admit, I can also imagine cases where I completely mixed up two kanji that look nothing alike - perhaps I entered こい for 愛. In that case there really was confusion, but I’d argue that’s: (1) probably the minority of cases, and (2) actually a different type of confusion than 丸⬄九.

I’d propose that a minimum level of visual similarity be required for alerting on 丸⬄九 and if it seems worth the effort, a different, optional alert type be added for something akin to 恋⬄愛

But all that said, I still massively appreciate the effort you’ve put into this script, and will happily use it as-is. I can always just configure the setting to minimize the popup by default, and treat it more as a “pull” than a “push” system!

Sinyaven · December 1, 2020, 11:43am

Yes, I originally intended to add more confusion types such as the confusion based on similar meaning that you mention, but when when testing it just with the visual similarity rating, it seemed already sufficient for a useful sorting and I kept it at that.

It never bothered me having to check if the top guess applies to my wrong answer or if it’s a “false positive”, but that’s probably because I never used my script while having to go through hundreds of reviews (I created the script one year after reaching level 60). I can see that it might become annoying to get proposed some highly unlikely guesses during long review sessions.

I will look into adding the setting you proposed. And maybe add a guess type for similar meaning, I’m not sure yet. I just wonder what default color this guess type should get – I’m running out of options from the WK color palette Maybe the green from the review forecast?

Sinyaven · December 6, 2020, 10:17pm

Version 1.13 adds your proposed option “Minimized if all ratings < X” to the dropdown. I also did some experiments with a new guess type for confusions based on similar meaning instead of visual similarity, but then I realized that most kanji on WK only have one meaning defined. To take your example with 恋 and 愛, the first only has the meaning “romance” and the second only the meaning “love”. So I would have to take a detour to the vocabulary and deduce additional kanji meanings from there. For now, I decided that it is not worth the effort.

Version 1.13 patch-notes:

added option “Minimized if all ratings < X”

In the settings dialog under the tab Interface, the dropdown menus for When answer correct and When answer incorrect now contain a new entry Minimized if all ratings < X. When selecting it, a slider appears below that allows you to define a value for X. If all guesses have a rating below X, the guess list will stay minimized. If at least one guess has a rating ≥ X, the whole guess list will be displayed.

The intended purpose of this feature is to only draw the user’s attention if a guess has a high likelihood of showing the reason for confusion.

Because the displayed guess ratings are rounded, it might happen that guesses with a displayed rating of e.g. 0.30 are actually still below a threshold of 0.30.

Please let me know if something doesn’t work.
Link to previous script version for downgrading in case version 1.13 doesn’t work for you

Talos · December 8, 2020, 6:19pm

Woohoo - thank you so much!

I waited a day or two to make my first mistake that triggered a suggestion before replying: and low-and-behold, the first one I hit was a perfect example of two kanji I get confused by all the time. No more false positives!

Again, thanks so much, this is an amazing tool all around. I’m not sure what the forum rules are on the topic, but if you’ve got a Patreon or a “buy me a beer” link and are allowed to share, I’d definitely contribute

Sinyaven · December 8, 2020, 9:25pm

Thanks, I don’t have anything like that, but I’m always happy to hear that my script is helpful to someone out there!

est_fills_cando · February 3, 2021, 10:25pm

This is a really useful script! One thing I noticed is that if you answer a question with a really short answer like “a” or “the” the script can add 200+ ms of lag to displaying whether you got the question wrong or right. I was thinking a good way of fixing it for length 1 guesses might be to have rateSimilarity return 0 if one of its arguments has length 1 and the longer string does not have the character in it and then rate the similarity of the strings that do according to how close their length is to 1.

Sinyaven · February 3, 2021, 11:52pm

I agree that my implementation is certainly not the most efficient, but in this case I think the problem lies somewhere else. Do you by any chance have “Use fuzzy search” disabled? I just checked and found out that with fuzzy search enabled, searchByMeaning("a") returns 0 results, while without fuzzy search, it returns ~800 items. And then the script has to rate the similarity of all the meanings of the 800 items to “a”. Admittedly, with a better optimized rateSimilarity() function this would also not be a problem, but I think it’s even better to prevent the case of having 800 useless guesses in the first place.

I probably shouldn’t just use startsWith() as an alternative to fuzzy search. At least now I know that ~800 meanings on WK start with “a”

est_fills_cando · February 4, 2021, 12:18am

I do have fuzzy search disabled. It was a while ago, but I think the reason I disabled it was that having looser matching criteria for meaning questions made the script more likely to offer suggestions in cases where it hadn’t before, but those additional suggestions often weren’t super relevant which caused me to tune out the suggestions that were relevant.

When I profiled the script, most of the execution time was being spent in the Levenshtein distance function, which is why I was suggesting potentially skipping it for cases where one of the strings is only one letter. (When I benchmarked always skipping it, the runtime of the script became much more reasonable.) Of course, reducing the number of times you call the function by reducing the number of search matches like you are saying is also a perfectly reasonable fix.

Sinyaven · February 4, 2021, 12:40am

In version 1.13 I have added the option “Minimized if all ratings < X” so that the guess list does not pop up if the guesses are probably not that relevant. Maybe that would help?

The problem is that I use the rateSimilarity() function for both English vocabulary and Japanese vocabulary, and in the case of Japanese vocabulary, having the longer string not contain the other single character should not immediately result in 0 similarity (because in Japanese, it should also consider the visual similarity of the kanji). It would definitely be more efficient to split up the function into rateEnglishSimilarity() and rateJapaneseSimilarity() and optimize the English version, but I think I would prefer to leave the function as is and instead prevent the 800 guesses from happening.

Sinyaven · February 8, 2021, 10:08pm

Version 1.14 patch-notes:

Performance improvement for one-letter answers with fuzzy search disabled

With fuzzy search disabled, I just searched for all meanings that start with the given string. This led to a huge number of results when searching for very short strings (e.g. ~800 results for “a”). As a result, you would get hundreds of worthless guesses after entering “a” as an answer to a meaning question. Such a high number of guesses also leads to noticeable lag.

To prevent this, I switch to only considering exact matches if the previous method returns more than 16 results.

Small performance improvements for "Levenshtein distance" function

When writing the ConfusionGuesser script I tried to avoid using for-loops and instead used forEach(), map(), filter(), reduce(), …
Sadly, my “Levenshtein distance” function was extremely slow compared to a version using for-loops (tested on Chromium), so I’m now using for-loops in that function.

An interesting discovery I made in my experiments:

Array initialization

console.time("new Array => fill => map");
for (let i = 0; i < 1000; i++) {
	let d = new Array(1000).fill(null).map((a, j) => j);
}
console.timeEnd("new Array => fill => map");

console.time("Array.from");
for (let i = 0; i < 1000; i++) {
	let d = Array.from({length: 1000}, (a, j) => j);
}
console.timeEnd("Array.from");

console.time("for-loop");
for (let i = 0; i < 1000; i++) {
	let d = [];
	for (let j = 0; j < 1000; j++) d[j] = j;
}
console.timeEnd("for-loop");

Result in Microsoft Edge (Chromium):

new Array => fill => map: 16.60595703125 ms
Array.from: 75.56396484375 ms
for-loop: 7.5732421875 ms

Result in Firefox:

new Array => fill => map: 29ms
Array.from: 16ms
for-loop: 22ms

Chromium is awful at Array.from()! Until now I had used Array.from() in my “Levenshtein distance” function, but now I’m just using a for-loop.

Please let me know if something doesn’t work.
Link to previous script version for downgrading in case version 1.14 doesn’t work for you

Held · June 20, 2021, 9:18am

I just learned about this script and it’s awesome. I always went looking for the confusing kanji by myself which interrupted the review process. Having this script makes the experience much smoother.

Thank you!

slinky07 · September 1, 2021, 2:37am

been using this for a few weeks and I just want to say this is so useful, it should be part of WaniKani itself,. Loove it.

username21 · September 15, 2021, 6:03pm

Hello,
Just now I was migrating my WK scripts to new PC and when I imported target ZIP with bunch of Tapermonkey scripts, somehow among other I discovered this one

i.e., “WaniKani item info injector”. I double-checked if it was in original bunch and it wasn’t. I failed to find it’s script page on forums, but it states inside of it to be made by @Sinyaven . So I just come here out of curiosity to ask what this script do and is it even possible for other scripts to auto-install their dependencies without my permission?

Sinyaven · September 15, 2021, 6:26pm

This script is a library script, which means that it is intended to be used by other scripts (similar to WaniKani Open Framework). It simplifies the addition of information to WaniKani items for userscripts like Keisei or Niai. I have created it because the recent change of the lesson page was breaking several userscripts (two of my own and several of other script authors), and instead of fixing them one by one, I wrote this library script and fixed all userscripts by letting them use WaniKani Item Info Injector instead of their previous injection method that was not working anymore.

That said, I don’t know why this script landed in your list of installed userscripts. It can be installed, but the only place where I linked to the Greasy Fork page was in the WaniKani Lesson Filter thread, and you would have had to manually install it. For now, all scripts just include it using the @require tag.

I have not created a thread for this library script yet, because there are still some things I want to change about it before I “advertise” it.

Anyway, you can uninstall the script, but you can also keep it. If it is installed like this, every other script uses this one (instead of their own @required one), so this setup makes sure that you always use the newest version with the latest bugfixes.

username21 · September 15, 2021, 6:55pm

Oh, I see. I do actually use Keisei script and seems like somehow Tapermonkey pulled all of it’s requirements as well. Thing made me curious most, was your script imported without proper link to it’s Greasy Fork page, so it took a bit to investigate it’s origin.

Anyways, thank you for your great job! I’ve updated it in my library to be always up to date.

shuly · November 19, 2021, 5:40pm

script seems to have stopped working since the latest wk update… ??

Topic		Replies	Views
Can 'Visually Similar Kanji' be integrated into reviews Feedback	20	1612	August 3, 2022
(feature request) the visually similar kanji feature Feedback	9	1063	September 21, 2021
Is it possible, to view all entered values? WaniKani	1	104	May 28, 2025
Script to compare radicals, kanji and vocab WaniKani	4	865	February 5, 2022
Is there a plugin that shows homonyms or same meanings? API And Third-Party Apps	6	1114	September 21, 2020

[Userscript] ConfusionGuesser

Related topics