Any known way to pull data #common from for selected Kanji, into Spreadsheet format?


It would preferably be in a format like this one, so that I can drill it.

I am doing this for this project:-

There seems to be numerous more N1 Kanji beyond Core 10K, but the amount 40, 112 and 73 are definitely doable.

Only if I am good at scripting…

///Edit (26 Mar 2017):
I have decided to remove from preferred source, so only Tangorin,com at the moment. Anyway, I might come up with a vocabulary list first.


Any known way to pull data #common from for selected Kanji, into Spreadsheet format?

Can you clarify which data and which kanji?


I now prefer, and I am not 100% sure that it uses the same database.

Kanji lists:

List A:
List B1:
List B2:
List B3:
List C1:
List C2:

Minimalistic spreadsheet result:

Vocab; Reading; Meaning; Alternative Meaning; Other forms (common only)

Actually, what I care about most at this time is ‘Reading’.


I have a rough idea of how to do this. Let me give it a shot and I’ll get back to you. Output is going to be a csv file of some form.


I’m currently extracting data from jisho right now. I’m using the same script I did on the topic you mention. We’ll see if I succeed :slight_smile:


And here you go @polv that’s the best I could come up with. There’s some missing data data (I have 19850 and jisho has 20670 for #common) and I dont know why but for what you need I think it’s sufficient


PS: If that’s something anyone is interested in, here’s the script I used


Thanks, but… higher Kanji, especially N1, is often hidden in “Other forms”.

Also, Jisho sometimes appears to show less common words than Tangorin for some reasons.

Lastly, Tangorin also tags whether which “Other forms” is common. So, which alternative Kanji is common, and which reading is the most common.

This will help me decide, for each Kanji, whether On or Kun readings should be emphasized. Which of the Kun readings should be emphasized first. But for this paragraph, I feel that this have to be done manually, and with great care too.

Sorry for slow explanation regarding the difference between Jisho and Tangorin.

Anyway, your old spreadsheet helped me a lot (and I added additional tags – WaniKani Kanji level; rather than WaniKani vocab level). I threw it into Anki for SRS, 10 kanji levels at a time.


I’ve updated the spreadsheet and I’ve added the “other forms” that you can find on the second sheet :slight_smile:

I hope it fits your needs.


Thanks, but some Kanji doesn’t even have a common word in Jisho.

For example, 冶 頒 侯 采, for the first 10 Kanji’s.

I might look into Python to find a way to optimize the code to fit my needs, so, thanks.


Well I think you should just browse jmdict then. “Commonness” is described by tags (news1 ichi1 etc…)

Search through jmdict:
JMDict documentation: (section you want to look for is “Word Priority Marking”)

After that you just need to download jmdict in its raw form and parse the content, if you know a bit of scripting it’s easily done.