A tool to show useful aggregated data based on your WK level (homonyms, frequency, part of speech, ...)

mrahhal · November 6, 2018, 6:45pm

At first I created this for a very specific reason in mind (Homonyms), but expanded it a bit after that. It uses your own current WK level data as a starting point, and merges in data from various other dictionaries. I’m thinking of a couple more potential features.

It’s kinda user-unfriendly right now, there’s no way to run it without building the solution. I might release it as a dotnet command line tool later.

(All the logic is isolated from the running context, so in theory the presentation can be anything, for now I just used the easiest, an excel file export)

Maybe some would find it useful nonetheless!

Thanks to @luffyuzumaki for his help on this as well.

This is the excel sheet it produced for me at my current level.

Jardik · November 6, 2018, 7:14pm

Oh god, this is just what I needed all along!

Maybe the next addition could be shared meanings kanji, so that it’s easier to find example sentences and differentiate between them - 必要、重要, that kind of thing

You rock! keep it up!

Ncastaneda · November 6, 2018, 9:51pm

Hello. I was wondering about the frecuency number in the example list.
How is supposed to be interpreted?
I see 着く and 付く, both been quite common verbs with a rather different frecuency.

Could you explain a bit about does that number came to be?

mrahhal · November 7, 2018, 4:56am

Thanks, glad you’re interested! That’s a very nice idea (Check my edit above for the excel sheet it produced for me)

mrahhal · November 7, 2018, 5:02am

Hello. The frequencies are taken from the innocent corpus dictionary. This is data collected from 5000+ novels. 8023 is still a high number relatively, but it seems that 着く is just used much more.

Of course, the exact frequency number should be taken with a grain of salt. So in this case I would simply say both 着く and 付く are very common. By the way, those are the frequencies of that vocab, there’s another sheet in the document that lists the kanjis with their frequencies. Looking at it, I can see that 着 has 264662 whereas 付 has 158202. Both are extremely common.

Actually, I had a look at the vocabs sheet, and it appears that 付ける is used much much more (98796) than both 着く and 付く. Interesting.

OmukaiAndi · November 7, 2018, 6:07pm

the kanji for 着く (i’m using it every day) is also used for 着る (very common, too). it also appears in related and very common words like 到着、着陸 and so on, so you’re likely to see it multiple times every day.
付く is less visible, because the number of related words with high frequency is lower. it’s also an extremely common auxiliary verb, for words like 思いつく、気づくand so on, but is then usually written in kana. so that means that, while 付くis by orders of magnitude the more common word, 着 is way more frequently used.

because of this divide, i’d welcome another stat “kanji frequency”, maybe as sorting option.

mrahhal · November 7, 2018, 7:00pm

Thanks for the insight. The excel file does have a kanji sheet (you can see it in the screenshot in the bottom tab) which contains the frequency among other things. It’s sorted by frequency by default.

Topic		Replies	Views
[SOLVED] Tracking homonyms Feedback	9	596	July 17, 2024
Wondering how to know whether a word is used often WaniKani	16	4855	March 7, 2020
Frequency distribution of kanji readings in Wanikani WaniKani	8	751	May 29, 2023
Visualizing kanji data WaniKani	12	763	March 19, 2023
[kanjiheatmap.com] Visualize WK Frequency Rankings - More common Kanji appear brighter, rarer ones duller API And Third-Party Apps	3	147	April 4, 2025

A tool to show useful aggregated data based on your WK level (homonyms, frequency, part of speech, ...)

Related topics