Idea for API: Kanji Comparison!

For some levels now I have been having a little bit of problem from time to time with kanji that look like other kanji. The problem is usually that after some time I end up recognizing the kanji based on part of it. So, for example, I would look at 列 (row) and think it was 例 (example), simply because I learned the latter earlier and have seen it more often. Alternatively, I have had trouble with 例 (example), 倒 (overthrow) and 側 (side), also because at the time I learned the others, I was already unconsciously used to recognizing those I learned earlier by things they shared.
So I was wondering if it would be possible to create an API that would show you similar kanji you’ve already learned during your lesson/reviews. Like, when you first learn the kanji there could be an extra tab for “Similar Kanji You’ve Already Learned”. Similarity could be defined as sharing 50% or more of radicals. So, for example, 例 would show up in the tab when you learn 倒 (because they share 66% of their radicals), and later on both  例 and 倒 would show up in the tab when you learn 側. Similarly, 例 would show when you learn 列 because they mutually share over 50% (列 shares 100%, 例 66%).
And when you got a kanji wrong in a review, the tab you open that tells you the right meaning might also show you similar ones you’ve already learned, so you can see where you made your mistake.

I’ve been trying to do this manually, but very often some of the kanji that show up in less vocabulary gets kind of forgotten, so it would be nice to be pre-emptive about it and not only avoid future mistakes but be reminded of the similarities and differences when you are first learning a new similar kanji.
I would do this myself, but I know nothing about coding. Since I know we have a great API community here, I thought I’d share the idea in case someone is interested.

I think doing it by a straightforward radical comparison would not work, because

  • some radicals are themselves composed of other radicals
  • some radicals are similar to other radicals
Probably the only way to define similarity would be to manually curate it.

@ElliottTamer,
Yeah, definitely an issue… especially starting around the early 20s levels.
Implementing a ‘kanji similarity’ system is high on my list of scripts to finish, because it only gets worse at higher levels (so far).

@rosyatrandom,
My initial dataset will use info from 3rd-party data sources to compare radicals and their placement.  They are more consistent about radical breakdown.  Once the similar kanji are identified with 3rd-party data, I can still use WK’s radicals in the breakdown.

I don’t know anything about coding so I can’t be much help to create the API, but I would love to use a tool like that. There a some kanji I just inevitably get wrong because my mnemonics or what I’m used to recognize them with are the same parts as an other kanji a few levels earlier. I personally think on a learning level it’s easier to encode the first time '‘Oh yeah right, it does look like it but that‘s different’’ than just don’t realize it and when you do your brain already mixed the both of them so you continue with the same mistake. What I mean is : I would prefer learning the differences than the similitudes, so it would be easier to differentiate the kanji.
I think this is THE thing that Wanikani is missing, telling you what is alike to prevent you from failing and encoding things wrong.
So anyone ready to encode this would be a genius or a saviour (or both, yanno); in all cases someone I would eternally grateful to :slight_smile:
I’m looking forward to this API !

http://similarity.gakusha.info/

I have the exact same issue.  I’ve been thinking that I need a tool for this.  Being a programmer, I also have some idea about how to implement it.

For what you’re saying, you could easily have an algorithm that figures out which Kanji are a problem based on what you’re getting wrong.  So I have the problem of mixing up 光 (light) with 先 (previous), but that’s mostly because of 光年(light year) and 先年(previous years/past years).  Well, any time I put in “previous” as an answer to 光 or “light year” as an answer to 先年, it would start to know that those two are confusing.  From there, it could then show them together like you’re saying.

This gave me an idea though for a simple app that allows you to select similar Kanji from the ones you’ve learned and then forces you to match Kanji to the meanings.  I think the biggest problem is just not seeing similar Kanji on the same page so you can get a better feel for telling them apart.

I’ve been looking for something to do to toy around with the WaniKani APIs.  I’ll try building this when I get a chance.

This is not an API but it may be of help.
http://www.nihongo-pro.com/kanji-pal/list/frequency/radical

I think the best way to do this would be as a community-maintained thing.  I don’t really think an algorithm is necessarily going to effectively tell which Kanji are considered ‘similar’ to a human.  I realise though, that even among people, what is considered ‘similar’ may vary, but at least you would end up with a list of ‘Kanji that people have marked as similar to this one’.

In the mean time, there must be lists out there of similar/often confused Kanji to start off with?

If anyone starts coding on this, please contact me ([email address removed]).  I’m working on a data framework userscript for WK that caches WK item and user data, and does various kinds of analysis… all for use by other userscripts.  I would like to keep up with whatever work is done on ‘similar kanji’, as this is high on my priority list.

rfindley said... If anyone starts coding on this, please contact me (rfindley at usa dot net).  I'm working on a data framework userscript for WK that caches WK item and user data, and does various kinds of analysis... all for use by other userscripts.  I would like to keep up with whatever work is done on 'similar kanji', as this is high on my priority list.
 I might take you up on that, I've been meaning to write something for this for a long time, but I never got to it so far.
Aika1 said...
http://www.nihongo-pro.com/kanji-pal/list/frequency/radical
Kaimera said...
http://similarity.gakusha.info/
Thanks... good info.

There’s a certain member of the WK community who’s supposedly working on something like this, but he’s been really busy and hasn’t been able to make much progress. I don’t have his programming chops but I’ve been thinking of doing some worksheets along these lines.

There are certainly some devilish comparisons. 直 値 and 置 tripped me up for the longest time.

I’ve been thinking about this problem in a slightly different fashion.
Many similarities occur because two kanji share the same phonetic. And in many of these cases, the reading is either the same or rendaku’d.
Having these kanji joined in a data structure would be a nice way  to cover a bunch of similarities.
Accessing that information through an API could then be a next step.

BreadstickNinja said... There's a certain member of the WK community who's supposedly working on something like this, but he's been really busy and hasn't been able to make much progress. I don't have his programming chops but I've been thinking of doing some worksheets along these lines.
Lol... Assuming you mean me (which is a totally accurate analysis :-)  )....  Our exchange student is back home in Japan now, so I've actually been making progress over the last few weeks.  I absolutely wouldn't change my hosting experience, but it is nice to have some free time again :-)
{"sort_kan_lessons":true}
I know absolutely nothing about coding, so this could be useless, but I wonder if the most straight forward fix would be something similar to the tagging system suggested in another thread the other day. This way both character similarity and reading similarity could be marked, depending on an individual's own confusion. I think for similarity to be marked, it would realistically need to be human curated as others have said, but given the different things different people have more trouble with, tagging would allow everyone to personalize it.
Kaimera said... http://similarity.gakusha.info/

 Really useful. Thanks.
rfindley said...
BreadstickNinja said... There's a certain member of the WK community who's supposedly working on something like this, but he's been really busy and hasn't been able to make much progress. I don't have his programming chops but I've been thinking of doing some worksheets along these lines.
Lol... Assuming you mean me (which is a totally accurate analysis :-)  )....  Our exchange student is back home in Japan now, so I've actually been making progress over the last few weeks.  I absolutely wouldn't change my hosting experience, but it is nice to have some free time again :-)
 Haha I was actually referencing looki, who also posted in the thread. But I'd also be excited to see what you come up with!

By the way, here’s an interesting paper on quantifying kanji similarity that doublevil found: http://www.aclweb.org/anthology/C08-1131
Its ultimate goal is to make finding unknown kanji in a dictionary more accessible, but it also proposes and compares various metrics for kanji similarity and even tests them in flashcard environments.