Any known way to pull data #common from Tangorin.com for selected Kanji, into Spreadsheet format?

polv · March 23, 2017, 7:54am

It would preferably be in a format like this one https://community.wanikani.com/t/JLPT-Vocabulary-vs-Wanikani/15693, so that I can drill it.

I am doing this for this project:-

There seems to be numerous more N1 Kanji beyond Core 10K, but the amount 40, 112 and 73 are definitely doable.

Only if I am good at scripting…

///Edit (26 Mar 2017):
I have decided to remove Jisho.org from preferred source, so only Tangorin,com at the moment. Anyway, I might come up with a vocabulary list first.

rfindley · March 23, 2017, 3:12pm

Any known way to pull data #common from Jisho.org for selected Kanji, into Spreadsheet format?

Can you clarify which data and which kanji?

polv · March 24, 2017, 1:42am

I now prefer http://tangorin.com, and I am not 100% sure that it uses the same database.

Kanji lists:

List A:
冶繭但頒肢侯遵謄采弐朕詔壱丙儒旺嗣抄嫡畝虞痘爵墾塑吏附宵逐褐楼勅硝逓翁薫厘孔斤薪
List B1:
樺脩橘巴渥惟禎苑惣圭祐倭肇漱楠笹晃鷹耀浩匡晋尭朋喬於榛嵯鮎絢蕉巽啄槙彬椿磯怜淳寅
List B2:
毅巳彦弘鴻李伍亘辰佑鳳綜悌柚穣碧邑秦皓卯彪舜允偲黎伽朔汐丑凱甫惇禄皐稀桐琢翠欽慧
List B3:
馨芹孟魁暉毬稜琉槻峻巌洲亨桂玲茅欣郁洸紘稔鵬敦蔦芙宏萩嶺黛酉旭蘭
List C1:
眸亥麿銑鞠茉燿脹詢蕗倖嵩滉伶玖莞錘捺凜裟碩勺頌菫赳彗晟迪袈捷熙柾昂奎丞絃茄胤紬叡
List C2:
椋洵菖勁誼蓉亦燎瑚恕耶梢凪衿匁澪梧琳燦晨綸晏昴爾笙侑椰崚侃紗竣柊瑶

Minimalistic spreadsheet result:

Vocab; Reading; Meaning; Alternative Meaning; Other forms (common only)

Actually, what I care about most at this time is ‘Reading’.

Tarubarin · March 24, 2017, 1:59am

I have a rough idea of how to do this. Let me give it a shot and I’ll get back to you. Output is going to be a csv file of some form.

WydD · March 25, 2017, 7:16pm

I’m currently extracting data from jisho right now. I’m using the same script I did on the topic you mention. We’ll see if I succeed

WydD · March 25, 2017, 8:24pm

And here you go @polv that’s the best I could come up with. There’s some missing data data (I have 19850 and jisho has 20670 for #common) and I dont know why but for what you need I think it’s sufficient

Cheers

PS: If that’s something anyone is interested in, here’s the script I used jisho-pull.py · GitHub

polv · March 25, 2017, 8:45pm

Thanks, but… higher Kanji, especially N1, is often hidden in “Other forms”.

Also, Jisho sometimes appears to show less common words than Tangorin for some reasons.

Lastly, Tangorin also tags whether which “Other forms” is common. So, which alternative Kanji is common, and which reading is the most common.

This will help me decide, for each Kanji, whether On or Kun readings should be emphasized. Which of the Kun readings should be emphasized first. But for this paragraph, I feel that this have to be done manually, and with great care too.

Sorry for slow explanation regarding the difference between Jisho and Tangorin.

Anyway, your old spreadsheet helped me a lot (and I added additional tags – WaniKani Kanji level; rather than WaniKani vocab level). I threw it into Anki for SRS, 10 kanji levels at a time.

WydD · March 25, 2017, 9:34pm

I’ve updated the spreadsheet and I’ve added the “other forms” that you can find on the second sheet

I hope it fits your needs.

polv · March 26, 2017, 3:19am

Thanks, but some Kanji doesn’t even have a common word in Jisho.

For example, 冶頒侯采, for the first 10 Kanji’s.

I might look into Python to find a way to optimize the code to fit my needs, so, thanks.

WydD · March 26, 2017, 6:23pm

Well I think you should just browse jmdict then. “Commonness” is described by tags (news1 ichi1 etc…)

Search through jmdict: JMdictDB - Advanced Search
JMDict documentation: JMdict/EDICT Project (section you want to look for is “Word Priority Marking”)

After that you just need to download jmdict in its raw form and parse the content, if you know a bit of scripting it’s easily done.

Topic		Replies	Views
Vocab list (separated by ',') generator -- add reading and meaning API And Third-Party Apps	41	4255	June 9, 2018
List of all WaniKani-Kanji with their Level API And Third-Party Apps	11	1256	May 23, 2023
Visualizing kanji data WaniKani	12	784	March 19, 2023
Fake Levels 61 - 70 or 無限 INFINITY Resources	25	14835	May 10, 2022
Kanji beyond WaniKani (esp. 常用 and 人名用) plus more [Spreadsheet, Anki and Memrise] Resources	27	18273	April 5, 2019

Any known way to pull data #common from Tangorin.com for selected Kanji, into Spreadsheet format?

Related topics