Are we really getting 2k kanji, 6k vocab?

I noticed this quite a while ago, but it’s nagging me a bit. We’ll be introduced to kanji and vocab that have exactly the same meanings and pronunciations, for instance 糸 as thread and pronounced いと as both kanji/vocabulary…

So my concern is, if a lot of the kanji are duplicated as vocabulary (a lot of these seem to come up!), are we really getting 2k kanji and 6k vocab? Are they counting these in both columns or just one? It seems like if they’re counting both, the numbers are a bit inflated for what we learn here.

Is it just me?

“Learning” a Kanji, and then learning a word that is represented by only that Kanji is not the same thing.


The thing is, 糸 the kanji and 糸 the vocab are different things, and while sure, obviously they’re related, if you see a kanji in a compound different concepts and readings should pop into your head first than if you see the same kanji on its own as a word in a sentence.
And there needs to be some information in there anyway to say “this can be a word on its own” since that’s not true of every kanji.

So it is worth learning them both seperately.
Either way though, I wouldn’t get too hung up on exact numbers! Suffice to say, WaniKani has a lot of kanji and vocab in it! I don’t really see the value in quibbling over the exact count. 6k vs. 5k and change vocab are gonna feel the same when you’re in it, you know?


I’m not too worried about it, it just seems a little inefficient and I wanted to know what people thought. I already paid for a year, so I’m here no matter what. :slight_smile: I’d be just a tad happier if they simply marked ones like this, like 皿 and 糸 and 赤… as Kanji/Vocab, since I learned them already. It’s no big deal though.


It might be a little inefficient for ones like 糸 where kunyomi / standalone word reading is probably the most common in general, (not to mention learning it as a radical too…) but I think the consistency of the system on the other hand (by hammering home the vocab/kanji difference) definitely makes it worth it!
And hey, worst case it’s a little extra reinforcement.


According to the Item Inspector script there are

483 radicals
2055 kanjis
6358 vocabulary

Single kanji vocab are counted separately in both kanji and vocabulary. Radicals identical to a kanji are also counted separately.

It is often the case that identical kanji and vocab differ because they have different readings. For instance it often happens that the kanji use the onyomi and the vocab uses the kunyomi.

If you insist that kanji that are identical to vocabulary are only counted as a kanji you would still have around 6000 vocab because there is a 358 items buffer for duplicate items.


Right, so the main question, then, is what is the count when you exclude the single kanji vocab words that use the pronunciation you are taught from the kanji card?

I haven’t really seen too many of these, so I’d be willing to be the numbers aren’t too far inflated. Even assuming an average of 5 per level, there would still be over 6000 vocab words. Nonetheless, even if the average is 10 per level, over 5000 (around 5700) vocab is a decent starting point.

All that said, I don’t care too much, myself. I use the vocab in WK as reinforcement for kanji readings only. My real vocab studies end up in Anki.

1 Like

Single kanji vocab are often pronounced differently and/or can have different meanings from the standalone kanji taught, so it makes sense to count them separately.

1 Like

I quickly grabbed some data from the API; seems that there are a total of 353 duplicates, so there’s still a bit more than 6k ‘fresh’ vocabulary items :slight_smile:


There are 614 single kanji vocab leaving 5744 other vocabulary items.

Most of the single kanji vocab ha different reading than the kanji but I couldn’t count how many there are.

1 Like

Only 404 single kanji vocabulary items share their reading with one of the accepted kanji readings, so that’s about the number of items that overlap. This doesn’t mean that they’re identical though, the kanji meaning might still be different in some cases. So worst case that still leaves you at 5954 vocabulary items, which would be fair to round up to 6000. I’m assuming that if you add the ones where the meaning is significantly different from the one taught for the kanji during the lesson you’ll probably end up over 6000.


How did you figured this out? Did you check for a common primary reading that counts as an accepted answer? Or did you just compare readings without checking for primary and accepted_answer attributes?


It would be bizarre to me to say that you don’t know 1 word and 1 kanji if you know いと and 糸.

And like… what, if they taught し, the on’yomi for 糸, in the lesson instead of the kun’yomi, this would solve the “problem” and now you’d know 2 items instead of 1?

BTW, they teach the し reading later on in the word 絹糸, so how does that factor into this calculation?

1 Like

I only checked for common accepted answers, so it’d be readings that are valid for both vocabulary and kanji reviews. A significant part of what you’re left with are just counters and suffixes, which tend to differ only in meaning from the actual kanji itself.

If you want to I could rerun the script with only the primary readings available, I’d just have to change one condition.

1 Like

I excluded these from my count because they start with a ~ and that brings the character count to 2. I only tested for vocab with one character. How did you test for one kanji vocab?

No. Testing for accepted_answer only is probably correct.


I used the length of the vocab’s component_subject_ids field to see how many unique kanji they contain. This would cause a problem if some trailing hiragana caused it to somehow overlap the readings, but apparently there are no items for which this leads to a conflict, so it’s reliable enough. If I filter out any elements with a ~ in them, then you’re left with just 353 items.


I agree. The kana would add some moras to the reading of the kanji and that would exclude the vocab on the reading test. A way to make sure is to test on for a length of 1 but that seems superfluous to me.

Then there would be 6005 vocabs distinct from the kanji.


You guys really know your stuff. :slight_smile: I had to stop and catch up, since the Crabigator went into maintenance mode when I was halfway through my reviews (I forgot sniff). I’ma mark one of these as the solution, but really there are many great answers. Thank you!


I just want to bring semantics into this discussion: you are learning 2000+ kanji, and 6000+ vocabulary. The fact that some items end up on both lists, doesn’t mean they shouldn’t be counted on both lists.

In my opinion the only actual duplicates in WK are those vocab that are identical, except one is the noun and the other has する behind it. You already get the ‘this is a suru verb’ info with the parts of speech for the noun only card. Those always feel like unnecessary padding to me, since you aren’t learning any new readings or combinations at all. Luckily there aren’t many of those either.

Oh and 誕生日おめでとうございます。Making you type a bunch of kana, only to teach you to read the first three kanji, that you already learn separately as well. Just put it in an example sentence, if you want to teach it so bad.


Yeah, this was the point I was trying to make above. 糸 being a kanji is undisputed, I’m sure, but the fact that いと is a reading for it does not guarantee that いと is also a word on its own. You have to learn that fact separately.

For instance the kunyomi of 付 is つ, but つ is not a word here (not in the same sense as いと anyway). The vocabulary you learn later on is 付く (つく).

For all one knows at the time of learning a kanji and its readings, いと could be similar. You just don’t know yet necessarily.

So no matter how you slice it, I don’t see how it can be seen as WK inflating numbers or something.