I just reached the ばい菌 lesson and the mnemonic states that “In case you’re wondering, the ばい is referring to the kanji 黴, but people don’t usually write that part.“
But according to JPDB, it’s written as 黴菌 20% of the time.
Meanwhile words like 可也 and 為る are taught on WaniKani with Kanji despite being written with Kanji less than 0.1% of the time.
What’s the logic behind that? 黴菌 is a word I’ll see hundreds of times (and have already seen multiple times in my studies). 可也 is a word I might never see once in my life. Why do we learn 可也 when 黴菌 is “not usually written like that“ ?
As wanikani’s aim is to teach you kanji, the logic behind ばい菌 vs 黴菌 is that 黴 is a rarely used kanji. This may not be the case if you are studying bacteriology but most of us will not see 黴 in the wild.
If you search on jisho for words containing 黴, you’ll see that very few words use this kanji. All of them are usually written without this kanji.
Edit: as for why this, but not others? Very good question. I understand that WK tries to teach average, common uses. But I assume it was curated by people whose personal milieu or living area might colour what words they think are common.
The description of the symbol meanings says that the little cross means that kanji is not in the Jouyou set. That’s not quite the same as “not usually written in kanji” – newspapers I think generally stick to the Jouyou set, but there’s no obligation on publishers to do so, and some words are quite often written with non jouyou kanji in novels.
FWIW, here’s the wwwjdic “google corpus lookup” stats (I believe this is based on a sample of crawled web pages):
which shows that commonness of a kanji depends a lot on what text you’re reading. (In particular I am not sure if the jpdb stats include the prewar aozora texts, which tend to be much more kanji heavy than modern novels.)
Edit: Personally I think WK should not teach 可也 and 為る as kanji: I think that’s completely pointless.
I’ve been thinking about this today as well. I don’t know for sure, but possibly the frequencies are calculated by the same process that creates prebuilt decks. Those decks include only 577 Aozora Bunko entries and 10 times more light/web/visual novels. So I’m assuming its frequencies end up weighted towards modern young adult kind of content.
ばい菌 is already not a super common word, adding a rarer spelling of it seems really pointless. Note that it’s also 俗称, which may bias the statistical data one way or an other since it will be avoided in many formal written contexts, I assume. I would also expect that 俗称 vocab would tend to avoid uncommon, complicated kanji.
In general if you find yourself mashing the space bar when trying to get your IME to find the kanji form, it’s probably not worth it.
Anecdotally I believe that it’s the case. But that’s the limitation of all these statistical analyses of native content: they’re always biased one way or another and in the end what matters is if the word/kanji is relevant to the stuff that you actually read.
Random tangent, but for us learners I even think it’s a good thing that there’s bias.
No human ever consumed content unbiasedly. We all have an extremely strong bias for what sort of content we consume. In the grand scheme of things no one is remotely close to a balanced exposure. For general sites like wk it’s not great, but for us individuals it makes sense to just find whatever frequency list is biased most similar to us.
I realized today that if you click on the “alternate form” you’re interested in and then the “used in” link, jpdb will show you which works that form is used in. Here’s what it says for 黴菌:
Used in Used in %
Anime 0 (0%)
Live action 0 (0%)
Visual novels 5 (0%)
Novels 23 (1%)
Non-fiction 3 (2%)
Web novels 1 (0%)
Aozora Bunko 30 (5%)
and for ばい菌:
Used in Used in %
Anime 21 (1%)
Live action 24 (1%)
Visual novels 28 (5%)
Novels 35 (2%)
Non-fiction 2 (1%)
Web novels 14 (1%)
Aozora Bunko 2 (0%)
This seems helpful in getting an idea of whether a particular written form is more likely to appear in one subtype of its corpus. 黴菌 seems much more common in the aozora texts, and the top few novels it appears in are all classics that are also on aozora (細雪, 人間失格, ドグラ・マグラ). So by no means “aozora only” but there’s a pretty strong tilt in what kinds of books prefer one over the other. (ばいきん is also moderately well represented on aozora, which makes sense because the “half kana half kanji” forms of words I think are a largely post-war development when the government started defining official use kanji sets. Before that you’d either use all kanji or all kana.)
Which makes me wonder if one advantage to teaching ばい菌 is that it lets you know that there is this class of words that appear as half kana half kanji, which might otherwise be quite confusing to try to look up in a dictionary when you encounter them in the wild.