For those of you who don’t know, the kanji kentei (kanken) is a test of japanese knowledge, namely kanji/vocab/yojijukugo, spanning many levels. Beyond the scope of the 2136 joyo kanji lies over 800 kanji jinmeiyou kanji. Beyond the scope of these 3000 kanji lies an extra 3000+ kanji covered by the highest level of the KanKen for a total of ~6300. Seeing as they aren’t included on the “common use” kanji list nor the jinmeiyou list, you might assume these 3200+ kanji are rare and the words containing them are not worth learning, but is that really true for all of them?
On the other end of the spectrum. We have Wanikani pleasant levels. Wanikani level 1-10. These are super simple words that everyone knows. Surely you’ll see them more than anything that didn’t make the joyo cut and the kanken doesn’t even bother to add until the notoriously difficult level 1, right?
Using jpdb.io, I have gathered the frequency in novels for 15 kanken level 1 words and 15 wanikani pleasant words. I also have 2 yojijukugo at the end for a total of 17 questions. Using your best guess, you need to try and pick the one that is more common within the series on jpdb’s database.
Low frequency words might differ only by a few percent, but higher frequency words differ by ~10% or more.
The frequency is for the word exactly as it appears. No alternate forms, and no kana forms for anything that is written in kanji.
Question 15 surprised me. 投擲 I’ve never seen before. Looks like I need to up my light novel game.
I was also pretty sure that words like 囁く and 呟く were used more frequently than their counterparts, I guess I underestimated 相手 and … 星? Haha
Great quizz I got tricked left and right. Very interesting how the fact that the frequency come from novels influence the results.
Also a while ago I was playing with some script matching WK vocabs to the the Balanced Corpus of Written Japanese. If you want to create a round 2, here are all the words from level 1-10 that are outside the first 20000 most frequent words of the BCWJ. Some are quite surprising too.
Honestly, I was a bit surprised by this one as well. The best reasoning I could come up with is its usage in similes, but looking through the books I have read that seems to be an underwhelming minority. Japanese people really just like talking about being under the stars or talking about the stars light it seems. Or the lack of stars in a handful of cases.
I went for some level of trickery in selecting some words whos frequencies were different than you would expect from their definition or complexity. This is very apparent in 魑魅魍魎 (Evil spirits of rivers and mountains) vs. 円い (round). 躊躇う and 躊躇 are quite common and aren’t completely overshadowed by 迷う. On the other hand, 女の人 is a bit more overshadowed by the other options for woman in novels. Still not rare by any means, doe.
I went for words with a similar frequency and tried to pick common words this time around with a couple exceptions. I thought it would be less exciting if I did rare words since, well, finding out that a kanken 1 word is exceedingly rare and seen less than a wanikani word is probably what a lot of people would expect haha. The problem was actually finding wanikani words that weren’t obviously rare. I looking for words in the 30% range was tough. It was either a really low percentage or 50%+.
I’m not opposed to doing another quiz with the actually hard kanken 1 words I know, but the problem is that there will only be a couple users on here who will actually know them, so it will be 100% guessing for everyone else haha.
I got 11/17 reading a lot really helps to get a feeling for such statistics. Though he purposfully did use some really rare words from the pleasent levels (marui). The 2 you were surprised by were also the 2 I was surprised by the most.
I got 10/17. Most were hard picks (well, other than the 相手, 病気, and 終焉 ones) , but 朦朧 being more common than 赤ちゃん surprised me. I feel like 赤ちゃん appears in most things I’ve read, while 朦朧 is more inconsistent.
In this zip spreadsheet, it has 841,912 entries (ranking of 1 to 536048) and many are shared rankings (not sure why but looks it’s just based on the frequency counter coming in as ties). And some searches didn’t yield the individual word so I posted the listing of the compound that came up (assuming most frequent from the search and the spreadsheet word search worked correctly).
Could not parse 女の人 for some reason, perhaps it’s the kana
There were 295 matches for 相手, 相手方 was the top it appears