Sorting WaniKani by usage frequency

While working on KameSame, I’ve become really interested in better sorting a “re-study” experience for the WK curriculum of vocab & kanji. One thing I’ve learned at level ~53 is that the vocab WK introduces at higher levels is a bit stilted/academic/unusual to my conversation partners, which I think is a function of the fact that WK vocab is always demanded by a kanji, and some joyo kanji aren’t very 常用 at all.

As a result, I’ve started looking into ways to establish a sort based on practical usage. Unfortunately, every social media & search API has really restricted access to aggregate searching to find this out. While I look into academic corpuses like ninjal’s, I decided to pay for the Azure/Bing search API and simply ask the number of search results for each term in WK.

Here are the results for you to take a look at:

Naive search results count for every kanji & vocab character string in WaniKani against the Bing Web Search API v7 · GitHub

What do you think of these? There are some very obvious false positives, especially near the top, and some false negatives (cases where WK uses kanji in words that Japanese almost only use hiragana, for instance).

3 Likes

You could look at the free BCCWJ word frequency list for comparison [here]. It separates the occurrence count by source type, so you can exclude sources like legal documents and textbooks if you want.

I assume you’re familiar with the BCCWJ since you know about ninjal.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.