A tool to show useful aggregated data based on your WK level (homonyms, frequency, part of speech, ...)

GitHub Link.

At first I created this for a very specific reason in mind (Homonyms), but expanded it a bit after that. It uses your own current WK level data as a starting point, and merges in data from various other dictionaries. I’m thinking of a couple more potential features.

It’s kinda user-unfriendly right now, there’s no way to run it without building the solution. I might release it as a dotnet command line tool later.

(All the logic is isolated from the running context, so in theory the presentation can be anything, for now I just used the easiest, an excel file export)

Maybe some would find it useful nonetheless!

Thanks to @luffyuzumaki for his help on this as well.

This is the excel sheet it produced for me at my current level.

9 Likes

Oh god, this is just what I needed all along!

Maybe the next addition could be shared meanings kanji, so that it’s easier to find example sentences and differentiate between them - 必要、重要, that kind of thing :slight_smile:

You rock! keep it up! :wink:

3 Likes

Hello. I was wondering about the frecuency number in the example list.
How is supposed to be interpreted?
I see 着く and 付く, both been quite common verbs with a rather different frecuency.

Could you explain a bit about does that number came to be?

2 Likes

Thanks, glad you’re interested! That’s a very nice idea :+1: (Check my edit above for the excel sheet it produced for me)

Hello. The frequencies are taken from the innocent corpus dictionary. This is data collected from 5000+ novels. 8023 is still a high number relatively, but it seems that 着く is just used much more.

Of course, the exact frequency number should be taken with a grain of salt. So in this case I would simply say both 着く and 付く are very common. By the way, those are the frequencies of that vocab, there’s another sheet in the document that lists the kanjis with their frequencies. Looking at it, I can see that 着 has 264662 whereas 付 has 158202. Both are extremely common.

Actually, I had a look at the vocabs sheet, and it appears that 付ける is used much much more (98796) than both 着く and 付く. Interesting.

1 Like

the kanji for 着く (i’m using it every day) is also used for 着る (very common, too). it also appears in related and very common words like 到着、着陸 and so on, so you’re likely to see it multiple times every day.
付く is less visible, because the number of related words with high frequency is lower. it’s also an extremely common auxiliary verb, for words like 思いつく、気づくand so on, but is then usually written in kana. so that means that, while 付くis by orders of magnitude the more common word, 着 is way more frequently used.

because of this divide, i’d welcome another stat “kanji frequency”, maybe as sorting option.

1 Like

Thanks for the insight. The excel file does have a kanji sheet (you can see it in the screenshot in the bottom tab) which contains the frequency among other things. It’s sorted by frequency by default.

1 Like