Statistics on vocab based on WaniKani kanji order and CSV to learn from

Hi all!

I decided to find out how many words in total you can technically learn depending on how much far you are in WaniKani. These stats might give you some insight into how far you are on your learning journey.

I have also included a CSV file of the Japanese dictionary sorted in the WaniKani order! This means that you can create your own Anki/whatever decks up to whatever kanji you want and learn every single Japanese word that uses it! It is a large file (11mb) with 128,328 words in there so happy learning. Here’s the file and the script used to generate the file.

One thing to keep in mind is that I based my analysis on the whole Edict. That means the results include the whole dictionary (185,760 words) instead of Core10k or something smaller! That to be said, with WaniKani or Core10k you can already go pretty far!

Most of the words in the dictionary are also either field-specific, archaic or ones you will most likely never need. Because of this, the results do not actually depict reality. No one stops you from learning all of these though.

I’ve also included the JLPT N1 kanjis that WaniKani doesn’t have yet. In total there was 2,288 kanjis whereas WaniKani has only 2,027 to learn. There is a lot more kanjis apart from these but nowadays are mostly used either in names or in some super-specific words so I have omitted those.

To imitate the WaniKani approach, each level after 60 has 33 kanjis in them, except for level 68 which has 30. As you can see from the graphs, the rest of the JLPT kanjis are not super important but still absolutely do have a value. I also have not included the words you write only with Hiragana or Katakana in the stats.

Here are some stats in text first:

  • There are 185,760 words in total in the edict.
  • You can learn to read 19,416 words just by knowing the hiragana and katakana!
  • The most useful kanji to learn is (Human) which appears in 2597 words!
  • The best kanji to extend your vocab when learning in WK order is (Honorific prefix). Learning it gives you the ability to learn 762 new words!
  • The most useful level is the level 21 which allows you to read 4071 new words!

And here are some graphs with explanations!:

Words learned / level

As you can see, most words are learned up to level 21. At that level, you should be able to read 4071 new words. That’s like 66% as many words as in Wanikani!

Around the levels 18-20, the curve kind of turns downwards meaning you are unable to learn as many words as you could before from a single level. However this is absolutely a good thing, as it means you have already learned quite a lot and the rest will be less important (but still important, don’t stop learning after level 21 please).

Total words learned / level

Here’s the total amount of words you should be able to read. As told in the previous section, the curve kind of stops growing so much at level 21 and starts to even out.

Each level gives you approx 1890 new words you should be able to read with kanjis you have learned so far. The most efficient level is 21 which gives you 4071 new words. The least efficient level is (unsurprisingly) level 1. You can read only 423 words after it.

50 most used kanjis

The most used kanji is unsurprisingly which appears in 2597 words. Nearly all of the 50 most used kanjis are learned between levels 1-10. The outliers are 性 (Gender, Lvl 14, 1857 occurrences), 合 (Join, Lvl 12, 1732 occurrences), 法 (Law, Lvl 15, 1731 occurrences), 無 (Nothing, Lvl 17, 1567 occurrences), 動 (Movement, Lvl 12, 1499 occurrences), 的 (Adjectifier, Lvl 14, 1372 occurrences), 機 (Machine, Lvl 20, 1249 occurrences) and 御 (Honorific prefix, Lvl 39, 1164 occurrences).

Most useful kanjis in WaniKani order

This might be a bit provocative title, but these are the kanjis that give you the newest words after you have learned them if you study in the order WaniKani wants you to. The “most useful” kanji is 御 (Honorific prefix) which allows you to learn 762 new words after learning it. That is more than half of all of the words (1164 occurrences) containing that specific kanji!

Kanjis used / level in total words

This graph shows you the useful Level 1 kanjis are used in 18234 words and level 2 kanjis are used in 24917 words. The “least useful” level is level 60 in where kanjis are used only in 861 words.

So that was it for this time. If you would like to see some more analysis on the data I’d be more than happy to provide.

16 Likes

Oh, very nice.

WaniKani actually teaches fifty-seven of the jinmeyo kanji. I find the inclusion of some of them to be pretty perplexing - like, why teach the 淀 from 淀橋 when you could instead teach the 幌 from 札幌?

Prefix.

Considering it levels out, a logarithmic curve of best fit might be better than a linear one. It’d just be a bit inaccurace for the first four or five levels or so.

1 Like

Since there’s a finite number of words in the dictionary, I’d say it’s a sigmoid instead. And then the curve fits almost perfectly.

2 Likes

Does Excel even know what a Sigmoid is? :stuck_out_tongue:

I had to look it up, myself…

1 Like

I don’t know about excel, but googling it up it seems that it’s possible with recent versions, yes. That being said, I would just go with curve_fit from scipy.

2 Likes

Real statisticians use only Excel :sunglasses:

3 Likes

I thought they only used R :stuck_out_tongue:

2 Likes

It’s this guy, right?

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.