Identify vocab missing from wk using Juman


#1
I was curious as to how much vocabulary I should learn outside of wanikani, so I spend the last 6 hours trying to get Juman (http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN)  to compile to Javascript using Emscripten.

I got it to work (kind of) and I put up an ugly page that classifies the vocabulary in three categories:
- blue words are on Wanikani (hovering gives the wk level)
- red words are not on Wanikani (maybe it shoud be?)
- purple words cannot be expected to be on Wanikani

If you want to try it at:

http://ikalou.fr:8123/lib/

you're going to have to wait a bit for the dictionary files to load the first time. DO NOT try to input more than 4/5 sentences at a time or you'll be sorry: it is painfully slow/memory hungry.

EDIT: Firefox, Chrome and Safari should all work now.



I know it still has many bugs but let me know how this is working for you, I'm curious as to how viable lexical analyzers such as Juman are. I don't want to spend too much time on it if it's not very accurate.

Annnnnddd now I'm behind my reviews.

#2

FYI, works on Safari 7.0.1 too.


#3

Thanks for the info,

It should work ok for every “good” browser now (utf-8 is a pain…).


#4

I apologize if I’m slow on the uptake here, but can you ELI5 what this is supposed to do? I tried to read the JUMAN manual and got lost pretty quickly.


#5

can you change the colors?
the red and purple are too much alike.


#6

@awesomeo

I don’t know any of the details, I had enough troubles just getting it to work the way I want.

JUMAN
slices up a sentence into morphemes and sticks a label to them (e.g.
普通名詞 common noun, 助詞 particle,…). When there is an ambiguity it simply
returns its best guess.

[水を飲んでいます] is broken down to [水 | を | 飲んで | い | ます]. It will even find the dictionary form [飲む].

I
use it to get a sense of how difficult a read seems or to find the
words that wanikani does not teach so I can put them in a deck if they
appear reasonaly useful. I wanted to make an add-on to do it on the fly
to webpages I visit but it is too slow as it is now.

@Apraxas

You’re right, I’ll try to make it a bit more readable sometime tomorrow.


#7

This seems promising! Thank you so much^^


#8
Ikalou said... @awesomeo

I use it to get a sense of how difficult a read seems or to find the words that wanikani does not teach so I can put them in a deck if they appear reasonaly useful. I wanted to make an add-on to do it on the fly to webpages I visit but it is too slow as it is now.
 Oh, that sounds really useful. I would definitely use that if you can get it to work as an add-on.

Also, I agree a new color scheme would be great.

Works on Firefox 26.0

#9

what are the pros and cons of this vs. Rikaichan? is the coloring the only difference? Does it give definitions? Is there a way to color an entire page automatically/at the click of the button, or do I have to copypaste the text into a form?


#10

和 was red in ur image but isn’t it on wanikani,  japanese style/peace?


#11
SoxKeepYouWarm said... 和 was red in ur image but isn't it on wanikani,  japanese style/peace?
 same with
休日


#12
SoxKeepYouWarm said... 和 was red in ur image but isn't it on wanikani,  japanese style/peace?
The Kanji 和 is on wanikani, but the word/vocabulary 和 is not.

Apraxas said...  same with
休日
 Are you sure? It always finds it for me and it is blue in my picture as well.

#13


Apraxas said...  same with
休日
 Are you sure? It always finds it for me and it is blue in my picture as well.
 


#14

@Apraxas

Oh that’s right… I hadn’t notice some instances of the word were splitted. I belive this problem should be gone now. Thanks for spotting this.


SamusAranX said…
what are the pros and cons of this vs. Rikaichan? is the coloring the only difference?

Well, Juman alone is not a english dictionary. It splits a sentence into morphemes and tries to tell what they are (noun, adjective, particle, …).

SamusAranX said…
Does it give definitions? Is there a way to color an entire page automatically/at the click of the button, or do I have to copypaste the text into a form?

If it ends up working well/fast enough it can then be used to build add-ons to do all sorts of things to webpages with japanese.

What I did here is simply search for words not on wanikani, but you could turn it into an enligh dictionary (by doing greedy matching just like rikaichan) or say, compile some stats and save example sentences for particular words as you browse the web, etc…