Identify vocab missing from wk using Juman

Ikalou · December 26, 2013, 10:56pm

I was curious as to how much vocabulary I should learn outside of wanikani, so I spend the last 6 hours trying to get Juman (http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JU MAN) to compile to Javascript using Emscripten.

I got it to work (kind of) and I put up an ugly page that classifies the vocabulary in three categories:
- blue words are on Wanikani (hovering gives the wk level)
- red words are not on Wanikani (maybe it shoud be?)
- purple words cannot be expected to be on Wanikani

If you want to try it at:

http://ikalou.fr:8123/lib/

you're going to have to wait a bit for the dictionary files to load the first time. DO NOT try to input more than 4/5 sentences at a time or you'll be sorry: it is painfully slow/memory hungry.

EDIT: Firefox, Chrome and Safari should all work now.

I know it still has many bugs but let me know how this is working for you, I’m curious as to how viable lexical analyzers such as Juman are. I don’t want to spend too much time on it if it’s not very accurate.

Annnnnddd now I’m behind my reviews.

Krzysiek · December 26, 2013, 11:17pm

FYI, works on Safari 7.0.1 too.

Ikalou · December 26, 2013, 11:55pm

Thanks for the info,

It should work ok for every “good” browser now (utf-8 is a pain…).

awesomeo · December 27, 2013, 12:06am

I apologize if I’m slow on the uptake here, but can you ELI5 what this is supposed to do? I tried to read the JUMAN manual and got lost pretty quickly.

Apraxas · December 27, 2013, 12:10am

can you change the colors?
the red and purple are too much alike.

Ikalou · December 27, 2013, 12:28am

@awesomeo

I don’t know any of the details, I had enough troubles just getting it to work the way I want.

JUMAN
slices up a sentence into morphemes and sticks a label to them (e.g.
普通名詞 common noun, 助詞 particle,…). When there is an ambiguity it simply
returns its best guess.

[水を飲んでいます] is broken down to [水 | を | 飲んで | い | ます]. It will even find the dictionary form [飲む].

I
use it to get a sense of how difficult a read seems or to find the
words that wanikani does not teach so I can put them in a deck if they
appear reasonaly useful. I wanted to make an add-on to do it on the fly
to webpages I visit but it is too slow as it is now.

@Apraxas

You’re right, I’ll try to make it a bit more readable sometime tomorrow.

Glorious · December 27, 2013, 12:46am

This seems promising! Thank you so much^^

awesomeo · December 27, 2013, 12:47am

Ikalou said... @awesomeo

I use it to get a sense of how difficult a read seems or to find the words that wanikani does not teach so I can put them in a deck if they appear reasonaly useful. I wanted to make an add-on to do it on the fly to webpages I visit but it is too slow as it is now.

Oh, that sounds really useful. I would definitely use that if you can get it to work as an add-on.

Also, I agree a new color scheme would be great.

Works on Firefox 26.0

SamusAranX · December 27, 2013, 2:06am

what are the pros and cons of this vs. Rikaichan? is the coloring the only difference? Does it give definitions? Is there a way to color an entire page automatically/at the click of the button, or do I have to copypaste the text into a form?

SoxKeepYouWarm · December 27, 2013, 2:42am

和 was red in ur image but isn’t it on wanikani, japanese style/peace?

Apraxas · December 27, 2013, 2:58am

SoxKeepYouWarm said... 和 was red in ur image but isn't it on wanikani, japanese style/peace?

same with
休日

Ikalou · December 27, 2013, 8:56am

SoxKeepYouWarm said... 和 was red in ur image but isn't it on wanikani, japanese style/peace?

The Kanji 和 is on wanikani, but the word/vocabulary 和 is not.

Apraxas said... same with
休日

Are you sure? It always finds it for me and it is blue in my picture as well.

Apraxas · December 27, 2013, 9:00am

Apraxas said... same with
休日

Are you sure? It always finds it for me and it is blue in my picture as well.

Ikalou · December 27, 2013, 2:05pm

@Apraxas

Oh that’s right… I hadn’t notice some instances of the word were splitted. I belive this problem should be gone now. Thanks for spotting this.

SamusAranX said…
what are the pros and cons of this vs. Rikaichan? is the coloring the only difference?

Well, Juman alone is not a english dictionary. It splits a sentence into morphemes and tries to tell what they are (noun, adjective, particle, …).

SamusAranX said…
Does it give definitions? Is there a way to color an entire page automatically/at the click of the button, or do I have to copypaste the text into a form?

If it ends up working well/fast enough it can then be used to build add-ons to do all sorts of things to webpages with japanese.

What I did here is simply search for words not on wanikani, but you could turn it into an enligh dictionary (by doing greedy matching just like rikaichan) or say, compile some stats and save example sentences for particular words as you browse the web, etc…

Topic		Replies	Views
KaniWani - English to Japanese Recall Tool API And Third-Party Apps	69	15269	January 9, 2016
OK you need to fix this! Feedback	31	3614	August 24, 2017
Be a little more forgiving if I enter kanji reading for vocab? Feedback	74	4061	October 20, 2019
A 'looking for the kun'yomi' option would do wonders Feedback	15	2421	April 20, 2018
Brand New and Looking for Advice WaniKani	22	1480	April 21, 2018

Identify vocab missing from wk using Juman

Related topics