I learned that the top 1000 most common kanji makes up 80-90% of the kanji you see in the wild

I graphed the cumulative usage of kanji across various public datasets (Twitter, Netflix, Wikipedia, Google etc…) and it turns out that 80 to 90 percent of the kanji you see in the wild is made up of the first 1000 most common kanji..

View the graph here: Kanji Heatmap

2 Likes

This type of distribution is everywhere in language learning: kanji, vocab, grammar etc…

The problem is that recognizing 80% of the constituents of a text doesn’t mean that you understand 80% of what you’re reading.

Like in the sentence I used at the start of this comment:

This type of distribution is everywhere in language learning: kanji, vocab, grammar etc…

Someone learning English may know all words here except for the more technical “distribution”, in which case they end up understanding:

This type of <?> is everywhere in language learning: kanji, vocab, grammar etc…

There are 13 words in this sentence, so that’s well over 90% of the sentence covered, but you still have no idea what it’s supposed to mean because the most important word is missing.

11 Likes

True. I learned that from this link

The link between vocabulary and reading comprehension is well known. You need to know about 90%-95% of the words in a passage if you are to comprehend it (Nagy & Scott, 2000)

3 Likes

…and also, knowing 80 or 90% of the kanji in a text is much less important than knowing 80 or 90% of the words in it. I do keep an eye on vocab frequency; I don’t track kanji frequency at all.

5 Likes

Yeah I actually tried coming up with real-life examples in Japanese from a book I’m currently reading but I quickly noticed that vocab in general (including pure kana words) was vastly more tricky than kanji coverage, at least in this particular novel.

1 Like

What novel?