The order should now really be according to their score
New main database, based on the differences in pixels when rendered with the Noto font (currently testing other fonts). The more pixels in common, the higher the score, pixels that are not shared reduce the score, as well as differences in the “grayness” of the kanji.
Common kanji are capped at 19 items, deleting one will shift in everything with a score of 0.3 or higher.
Would it be difficult to add the option to ‘clear all similar kanji’ for a given kanji? I’m getting a ton of kanji that I don’t think look similar at all, and of course when I delete them more just come in to fill the space. It would be nice to be able to clear them all at once then just add the ones I’m getting confused.
Also - you’re saying customizations get stored locally. How are you implementing that, the browser cache? I clear mine from time to time, so is there a way to export/import the database?
Another option would be to make an adjustable threshold number. At the moment all available similar kanji are displayed, even if they have low scores (they will stop coming after 0–20 more kanji though ). This will greatly reduce the number of similar kanji. I will have a look.
(Generally I find it easier to delete similar kanji than searching and adding them because it is really time consuming.)
Your personal database is stored inside Tampermonkey with GM_setValue. It survives clearing the browser cache, and I didn’t see any problem yet with Tampermonkey updates.
You can store it by going to the Tampermonkey Dashboard, click on the userscript name, and then copy/paste everything from the tab “Storage”. It contains the settings and the override_db with your modifications. You can also paste+save to restore an old version.
I added a “minimal score” option to remove the too many kanji problem! The score is a value from 0–1, higher values mean that the similar kanji must resemble the original one more closely. You should also check if you want the “Original Sources” option, the kanji there may not always fit, but sometimes there are some gems in there.
You can use both, they have a bit of overlap but not that much. The Keisei script lists kanji that share a kanji component, like death star (兪) -> 諭癒喩愉輸. Those kanji are of course also visually similar, so this script here will display them as well. Keisei shows all phonetic compounds in a short list, so if you are interested in phonetic compositions you will have a hard time figuring them out just using this script here.
But there also many more kanji that are visually similar (think 諭+論). Niai uses a computer-generated list where all kanji were checked against all other kanji, so it focuses more on the “visual” aspect. So basically it means it will 1. display more similar kanji and 2. will show similar kanji for (almost) every kanji in WK.
You can influence how many similar kanji are shown. The score shows how similar two kanji are, 0 means nothing in common, 1 means they are exactly the same. 未末 have score 0.81, 輪輸 0.7, so for these scores the kanji are still very similar.
So if you want less similar kanji in the list you can set the minimum score to 0.6, I prefer to have many options and then just remove the ones I don’t like from the list, so I use 0.4 or 0.5.
Hmm, it replaces the “Visually Similar Kanji” section that WK already has, toward the bottom of a kanji details page, and appears on the last page of a kanji lesson (“found in vocab” or something), and the “readings” page in reviews. Only on kanji items, though.
Not yet, at the moment it is a DB with similar kanji for the jouyou kanji (and a ~80 MB matrix with all mutual scores). But I can imagine it as a service, and even auto-generate missing lists for a kanji query on demand. The latest version generates PNGs of all kanji in different fonts and calculates the similarity based on mutual vs differing pixels, so as long as the characters are included in fonts you could compare anything with everything automatically.
I really like this script, it’s one of my favourites.
I noticed that it rarely has the kanji I have mistaken it with, though, haha. I almost always remember the right side of the kanji, but mix up the left side, whereas the database (is that what it is?) that comes with the script almost always has a list of kanji with the left side the same, but the right side changed.
I like how you can add your own kanji mistakes, though! Super useful. I change the number-y thing to 0.99 to just show my own mistakes.
Thanks! The similar kanji come from different sources, you can experiment a bit with the “Original Sources” in the options, when it is enabled it add more kanji that are related by the “logical” relation (like how many strokes must be changed), the “new sources” focus on real visual difference, like how many pixels must be changed to turn one kanji into another.
If you want to focus on differences in for example 誰椎推崔進堆 you can also give my other script Keisei a try, the right-hand side is often helpful for guessing the reading of a kanji as well, and the left-hand side (rather the “real radical”) is related to the meaning of a kanji.
I rather like to see lots of similar kanji to get some inspiration what the future problems will be, but manually editing everything is also an option. There is a “manual database” with score 1.0 though, so even with 0.99 some stuff from me might show up.