KanKan: kanji-component-aware online Japanese dictionary

Hello all,

I made a web page where you can look up words using fragments of kanji:

I read a lot of Japanese where I can’t easily look things up automatically, such as paper books or retro videogames. I often find myself wanting to look stuff up by providing “bits” of kanji when I can’t identify the whole thing.

For instance imagine that you stumble upon the word 鉤素 in a book and you recognize 素 but you have no idea what 鉤 is. In jisho you can search for ?素 but that’s going to give you a ton a matches and you may have to scroll a bit to find the result.

Alternatively you can try to find 鉤 by drawing or component or phonosemantics, but maybe the font is blurry or tiny and you’re not sure what’s to the right of 金.

So ideally what you’d want is a way to search for “a word where the first kanji contains 金 followed by 素” and that’s specifically what this page lets you do: searching for 「金」素 finds the right word:

You can also replace one kanji by a similar-looking one (as in, one with similar components/radicals). So for instance suppose you’re trying to cook your favourite ramen and behind the package you find the word 沸点. You’re hungry but you have no idea what 沸 means, however you can tell that it looks a bit like 弟, so you can search for 「弟」点 and you immediately find the right word:

Note that by default, in order to limit the amount of data being downloaded and speed up the search, only the top ~60,000 words are searched. If you want to search through all 200,000+ JMdict entries instead, you have a toggle at the bottom of the page. You also have another toggle to enable searching through proper nouns (people’s names, brands, works of art etc…).

61 Likes

I saw your post in the videogame thread but I had no idea you would get to it so fast!!
I will definitely try it out for the game I’m playing, thank you so much!
It will definitely help in speeding up lookups.

4 Likes

At first I considered just emailing the idea to jisho.org and see if they would do it but I thought I’d do a proof of concept first.

The sad thing is that this would have been vastly more useful to me when I knew fewer than 1000 kanji and I would find unknown ones all the time, nowadays it’s less of an issue for me…

7 Likes

I’m definitely interested in such a tool! I have yet to read something without using Yomichan but it’s going to happen with the next pick of the ABC as it’s an author I really want to read and it didn’t have an ebook version. It’s in three weeks times though, so probably expect feedback around that time :slight_smile:

Wow :open_mouth: so that’s what you are doing when you are not trolling me on the forum hmm

8 Likes

Ha! One thing that’s amazing with the Japanese language community is that there are countless high-quality resources available which makes developing such a tool fairly painless.

There’s JMDict for the dictionary, kradfile for the kanji components. Add a little bit of regex-fu on top of it and it Just Works.

8 Likes

Wow, that is absurdly fast!

Thank you so much, this will be extremely useful to me.

3 Likes

This seems useful! I’ll give it a go next time I read :+1:

One thing I noticed though is that it’s limited to only specific angle brackets.

I tried 社会〈白〉and it found no results. But 社会<白> matched 社会的.

Perhaps you could add a quick way to input a kanji component? The brackets are quite awkward to type on mobile.

Maybe a button that creates a pair of correct brackets and puts the cursor between them?

4 Likes

Ah I already had < and <, I didn’t think about the half-width version, I just added them. I also handle whitespace better: 社 会<白>

I agree that typing <> is frustrating on mobile, maybe a button is the solution or I could find a better set of delimiters.

2 Likes

This is absolutely amazing, and I am in awe on how fast you made it :palms_up_together:

Definitely see if Jisho would be interested in it and sell it for some nice :yen: (if you want)

4 Likes

Ahah, I got enough out of jisho for free that I’d gladly give them the idea free of charge!

5 Likes

Maybe !! would work better? Like !線!糸 . It’s not quite as good looking because the brackets “block” the special expression but it is very simple to type on virtually any keyboard.

EDIT: maybe ‘、’ and ‘!’ (and ASCII equivalents)? 「、線!糸」. A bit messy looking but it would do the job.

EDIT2: I pushed the update with this small change for testing: 、口虎!,土!き

I think it looks a bit ugly but it’s certainly easier to type on mobile…

2 Likes

Oh man, Japanese punctuation is a nightmare! I did a lot of Japanese string processing (for a Japanese client) in the last few months and it turned out to be so much more tricky than English: 4 different scripts, full width, half width, non standardised punctuation.

One fun example is when the client used / in the file names and I needed to make sure I don’t convert it to / when normalising file names, otherwise it’d add change the depth of the directory tree.

Oh and another “favourite” is the composite kana. Like ガ can be represented by a single UTF8 character or by two: カ plus ゛
Of course those would be different strings, but they look exactly the same :confounded:

6 Likes

Yeah it’s hellish, I also notice that there’s quite a lot of JIS charset in the wild over there, while in the west these days most of everything is unicode.

It actually takes me back because I started programming in the early 00’s and back then you still had widespread use of local charsets: the various ISO-8859 charsets in Europe, KOI8-r in Russia etc… And then the anglos using only ASCII and you could be almost certain that no accent would ever display correctly for them. Back then charset troubleshooting was a routine occurrence.

But now I never do this anymore, I just assume UTF-8 unless proved otherwise…

4 Likes

I pushed a bunch of changes:

  1. On top of using the radicals from KRADFILE I added the radical data from KANJIDIC. The idea being that if you want to search for, say, 利潤, KRADFILE only has this decomposition for 潤:

潤 : 王 汁 門

You can see there’s no easy way to specify the 氵 radical in a search. Even if you were to copy-paste ‘氵’ it wouldn’t have been recognized because it’s not in the list on its own.

My solution to this was to add the radical data from kanjidic, which lists 氵 and its full version 水. So now the list looks like:

“潤”: [ “水”, “氵”, “氺”, “汁”, “王”, “門” ],

And now you can search for 利潤 with 利<水>:

  1. You can see in the above screenshot that I also display frequency data (sourced from JPDB). I also use this information internally as a secondary sort criteria for the results.

  2. I reworked the match scoring algorithm to penalize extra-long matches and missing components in order to hopefully have better (and fewer) results.

  3. I automatically convert other delimiters to <> in the search bar, this way you can search on mobile with 、! and it gets converted to the “pretty” version automatically. The best of both worlds, hopefully.

  4. I improved the handling of the history (when you press “back” on the page) to return to a previous search.

10 Likes

I pushed another update:

  • I improved the radical handling some more to hopefully get even better matches. I noticed that the previous version wouldn’t find 謎 from <言迷>, now it does.
  • I added readings

7 Likes

This is amazing thank you! Can I help improve the design?

4 Likes

By all means. It’s just an html file and accompanying js, the repo is here:

There’s a heap of unrelated scripts in that repo, it’s my catchall for every japanese-related script I’ve cobbled together over the past couple of years, only ksearch.html and ksearch.js is relevant here (+ kankan-preprocess.py that creates the JSON dictionary files).

5 Likes

Alright then, just gonna check the style and prolly do a css file if required.

P. S. : I see sweet things on that repo heh!

2 Likes

That’s because you haven’t looked at the contents yet… It started as a very simple script to get the WaniKani kanji list using the API in order to generate an Anki deck and it balooned from there, and I never took the time to clean it up.

5 Likes

You are just telling me the classic story of all projects heh!

5 Likes