[Userscript] Highlight Kanji based on WK level in websites

This script highlights any Kanji on websites with a specific background color. This is based on their WaniKani level and can be used to identify which texts/sentences you can already (partially) read, or just enforce your brain to focus on what you’ve already learned. It also makes it easier to tell whether you know a kanji or not, since some kanji look very alike.

First, you have to specify a threshold WaniKani level in the “UserScript commands” menu in the Greasemonkey/Tampermonkey menu (there should be a large button next to your address bar). Click on it, and select “Set Wanikani level” (in Greasemonkey it’s in the user commands sub-menu). This threshold should be set to your current (or previous) WaniKani level, because every Kanji up to that level will be highlighted as green, which means that you should be able to read that Kanji. The default value is 1.
Not yet known Kanjis will be highlighted as yellow, orange or red, depending on how late you will learn these on WK (red meaning that it’s in one of the final levels). Kanji not covered by WK are black. There are also two custom lists, ‘seen’ and ‘known’ that can freely be edited by you.

This picture was taken with my current level (15):



Explanations of the different colors:



You can use the “Set highlight settings” open to configure which color groups to display.

The colors have been updated to be less visible since this picture was made!

I don’t think this script is anything groundbreaking, but I kinda felt like making it. Sometimes I like to read random Wiki articles and add any words that I can read but don’t know yet to Anki (unless they’re super obscure), so this motivated that a bit.  Also, it’s kinda nice to have the percentage of what you can read at such a (relatively) low level visualized :slight_smile:

https://greasyfork.org/scripts/722-kanji-highlighter
It has been tested in Firefox (Greasemonkey) and Chrome (Tampermonkey) and seems to work equally well in both.

Everybody is free to contribute fixes and features to the code:
https://github.com/looki/kanji-highlighter
I plan to completely overhaul the script soon; more info on that in the link.

20 Likes

I LOVE it !! Thank you ! It is exactly what I needed :3

edit : I use chrome, and some kanji I already have seen are yellow and orange. How can I change the WK lvl via tampermonkey?

edit 2 : nvm I did it manually by changing the script. 

G
E
N
I
U
S

Thanks /(^_^)/

Lovely! Thank you so much :slight_smile: On Chrome I just installed “Tampermonkey” to use it :slight_smile:

Glad to hear that it’s useful to you. I will probably try to optimize it a bit and see what I can do about making it nicer to use in Chrome.

thanks! really want to try it out)

looki said... I don't think this script is anything groundbreaking,

Let me just say you're wrong.
It's brilliant and I'd never have had such a genius idea myself. Thank you so much for this.
Sandrinette said... I LOVE it !! Thank you ! It is exactly what I needed :3

edit : I use chrome, and some kanji I already have seen are yellow and orange. How can I change the WK lvl via tampermonkey?

edit 2 : nvm I did it manually by changing the script. 
 Post changes for users who know nothing about coding please?

Can you please make sure that this is actually not supported by TamperMonkey? I just had a look and, according to their documentation, they do implement the “user command” feature which means that somewhere in the TamperMonkey menu, you should be able to find my script’s menu item.

Saponutti said...
Sandrinette said... I LOVE it !! Thank you ! It is exactly what I needed :3

edit : I use chrome, and some kanji I already have seen are yellow and orange. How can I change the WK lvl via tampermonkey?

edit 2 : nvm I did it manually by changing the script. 
 Post changes for users who know nothing about coding please?
 All I did was to change the ("level", 1) by ("level", 10) and (level, 1) by (level, 10). 10 is my level. Maybe it is wrong, I know nothing about coding myself. I just randomly tried something and it worked out :)

Chrome with Tampermonkey working fine. Maybe restart your browser?


Working like a charm in Chrome’s Tampermonkey, Firefox+Greasemonkey too. One merely needs to edit to change the user level, same as in Firefox’s Greasemonkey. looki, this is wonderful. But I think its potential exceeds what you’ve done so far.

Requests/Questions:

  1. Did you have to input each kanji’s unicode manually for the script (ouch), or have you some broad converter that others can use and then dump?
  2. Rather than specifying a level for things individually in the script as at present, is it possible to create a set or similar and then simply follow that with the kanji included in each set/level?
  3. Can you do this with vocab?
  4. Can you teach me/us how to do this?
  5. Is there a way to make the script function within the built-in PDF viewers in Chrome or Firefox (probably it’s not file:// that’s at issue, but the matter of an app within an app (Greasemonkey gives no script as available with PDFs open)…because then we could read digital novels on our browsers with the benefit of this script. AMAZING.
These questions mount up to something:

Basically, if we can do this with vocabulary too, you’ll have a user script that does much of what LWT does, except better and without the need for external parsing programs, especially if one has integrated with Rikaisama and Anki: Auto-Import as we were both recently talking about on another thread. Oh, and one wouldn’t need to learn to set up a server as I did, as this just runs as a user-script.

Can you teach me/us how to conveniently input kanji (and hopefully vocabulary too) in the appropriate Unicode (I assume only Unicode will do)…what I’m envisioning is user-created lists (KNOWN, SEEN, UNSEEN) in addition to the WaniKani levels of “Known Kanji/Vocabulary” (outside WK), “Seen Kanji/Vocabulary” (for those of us that collect them) and then a final colour for everything “Yet Unseen”…with exclusions for kana, numbers, romaji, punctuation marks and so on…that should not be a mess?

Looking at your script (I know next to naught about all this) it seems like it’s focussed exclusively on kanji at present, but it looks like you could reasonably include words with some exclusions. Of course, this would only prepare it for pre-input (KNOWN, SEEN) vocabulary (and in the case of verbs, unconjugated)…as deciphering the parsing of unknown vocabulary would be quite a struggle (say in MeCab or KAKASI, etc. - all of which have their imperfections)…but as a user being able to make a judgement call on new words with Rikaisama’s assistance ought be easy enough - especially with the known vocabulary around it pre-delineated.

In essence, I think you’re at the cusp of making an awesome reading tool of great potential. An unintrusive light-weight reader-userscript that highlights known and seen vocabulary within the browser, can export a full card (definition, reading, sentence, audio, translation) to Anki with a hover and single key-tap (via Rikaisama), and then from Anki or other databases one can update their own KNOWN and SEEN lists within the userscript. So, please work on!

PS: Additionally, if you could make a version with customizable kanji or one that reflects their levels/progression (I don’t know if people are changing up the order of Heisig these days), the folks over at the koohii RTK forums (http://forum.koohii.com/) would probably love you.
PPS: Especially with vocabulary perhaps, a cool bell and whistle for WK users might involve the use of the user API key to colour code highlighted kanji according to WK progression (Apprentice, Guru, Master, etc…) as on the “Progress” tabs here on WK.
PPPS: I think you’ve a typo in the userscript section of the script. It seems you tried to make an exclusion for wanikani.com, but rather than typing exclude you have excluse.

Wow,  had no idea this script would spark so much interest.

First of all, I’ll see what I can do about making it more usable in Chrome. I’m very much a noob in userscript/javascript development and I really don’t know all the tricks etc.

1.
Haha, no. I requested a full list of Kanji from WaniKani’s API. Then I
copied it into a text editor and did some magic to  remove everything
but Kanji + Level. The unicode values appeared once I put the file into
an online JSON minimizer tool. WaniKani gave me the actual Kanji in the
file without any encoding, so it should work in the userscript as well. I might actually revert them to the actual characters in the next update. Either way, it was just a side effect of
that online minimizer.
2. Sure. It would have to be converted into a Kanji->Level map at run-time for maximum efficiency, but the conversion should only take a fraction of a second.
3.
I’ll also  be referring to what you wrote below. You’re saying that you merely want to highlight known and seen vocabulary and not parse the entire text, but there’s a problem with that. Basically I would have to search through the entire text, checking at each character if a word starts there. At least that’s the only way I can think of. I don’t know if it would be feasible to implement this at a reasonable speed.
4. Not sure what you mean. Coding userscripts?
5.
Alas, when I wrote the script I hoped that it would work with PDFs<
but it didn’t. I can see if there’s anything I can do to make it work.

What you’re suggesting is definitely a nice idea, but as I wrote above, I’m not entirely sure if it’s doable / within my scope. I’m quite sure that if I were to make such a tool, the vocabulary lists would grow into the thousands - and your average Japanese Wikipedia article is quite long, which would mean that there’d be thousands of lookups. Now there are a few ways to make this lookup more efficient, but I still think it might be a little too heavy for a userscript - especially if it’s coded by someone who has literally no Javascript-specific optimization knowledge.

I think that for now, I will focus on Kanji and I’ll try to make the script more useful to not only the WK community but anyone who has some kind of Kanji list. I’m definitely not tossing the vocabulary idea, though.

PS: I had something like that in mind. Would be pretty easy to make a few presets like WK, Heisig to choose from anyway.
PPS: Yeah, I’ve actually made a script (a port of Wanikanifyer to FF) that loads data from the WK API before, I just didn’t think it would be worth the effort here. For Kanji, it’d only make a difference for those Kanji on the threshold level that you have not learned yet, which are very few and only exist for a few days anyway.
PPPS: Thanks for the typo, I was wondering why it’d still run here.

I love it ^.^
Thanks for making this script~

Heh, I was thinking over this after work and grumbling to myself "No, it’ll definitely need parser functionality to manage vocabulary."
That said, that’s not totally monstrous - are you familiar with Dani’s Space Inserter? It parses and spaces according to MeCab or KAKASI. With LWT I’ve been using Dani’s Inserter, comparing both parsings and tweaking the results so as to have minimal user burden…it’s not perfect, but it really is something. Your userscript doesn’t work with it immediately as yet (PHP issue?), but it’s not very difficult to get text from Dani’s to a page where your script would work. Perhaps it’s not a bad show that one can parse things in a similarly lightweight fashion (Firefox’s Furigana Inserter perhaps is another, more dynamic example of lightweight MeCab parsing on the fly). So, I wonder if we should be optimistic! :D As for the vocabulary-based LWT-like usage of the user-script that I’m dreaming up - I can imagine that somebody whose “KNOWN” list encompassed, say, most of Core 10K, a good chunk of WaniKani vocabulary, perhaps conjugations too…that might be a bit heavy going. I have no idea how heavy going on a system…because I really have no clue.

As to your responses, perhaps the main one requiring a reply is (4): The question was basically: How would one go about doing (1), (2) and (3) in order to add their own kanji or vocabulary in [EDIT: I’ll take a look at this, and at using both Unicode and Kanji, and see how it goes - I think you’ve described it all adequately, although I’ve no idea how to go about creating a set and making rules for it, so I’d probably just copy the <item><level> format you have.]. For now it seems like adding your own kanji as being whatever the present known level and seen kanji as known level+1 would be the best thing. Or could one state some special level for SEEN and colour them accordingly? It might be cool to highlight WaniKani and non-WaniKani SEENs the same way, but colour the text differently to differentiate.

About (5) though…that would be mighty cool if it were possible.

I really enjoy your script at present though (it was rather nice to see how much of WikiPedia is actually okay) and I think it already gives a good visual sizing-up of any web-page. That’s great. From there perhaps the user can be left to parse, perhaps use LWT and then Rikaisama->Anki. Probably that means I’ll be using the script just initially…and maybe where things aren’t too crazy that and Rikai-sama will be plenty. I just feel that with this script and some of the other in-browser tools, we’re close to something really comprehensive. I only got the Rikai-sama->Anki plugin setup today and it blew me away - I was thinking at the time that it at least made LWT’s much inferior Anki-import irrelevant, and then BOOM, the next thing I read is your post!

If you have any other brain-waves about this or related ideas, I’m certainly interested!

I had not been familiar with Dani’s Space inserter until now. Seems like the “MeCab” method is superior, at least it correctly recognizes noun compounds, noun suffixes and verb conjugations - though that might be subjective. I definitely preferred that method, though. (EDIT: Derp, I just realized something. For the purpose of the script, the other method would be much more suitable since we want to separate suffixes and compounds so that we’d be able to get the most coverage possible)  Hm, I suppose this vocab thing might be doable! I’ll just have to see what kind of effort it would take to implement and whether or not I’d have enough time and will to do it, but right now I’m relatively optimistic. With this kind of data processing, it’s always about how algorithms slow down when the amount of data increases. That’s the key aspect and I’m not an expert in this kind of text processing by any means. This is probably the most difficult part of the whole concept. The interface that allows you to add words etc. should be no problem.

I’ll probably sit down and rewrite the Kanji list part of the script tonight, making it possible to add and remove custom Kanji or editing the list altogether. My plan is basically to have the WaniKani set as a default, in a Level->Kanji map instead of a Kanji->Level map ( 1 : [人、一 …], … 15 : [信、築…])  This will make it much easier to edit the list. But you won’t have to edit the script manually anyway. You’ll be able to edit the raw list yourself using a text field popup, from where you can copy it into a text editor, or via commands like “Add kanji”, etc. I still don’t know anything about creating interfaces for userscripts. I guess they’re kinda limited and a true firefox add-on would be better, but it seems much harder to get into add-on development and it wouldn’t work in Chrome etc. at all.

A useful Furigana creator site I found is http://www.mori7.info/musi/123456.php

This will let you paste in any Japanese text, and show the furigana ruby HTML text. I’ve used that to create a single-page cheat sheet of the kanji I needed to study for the Kanji Kentei. It will also color the kanji, but that’s based on the Jouyou sorting, not the WK or Heisig indexes.

And it’s a server-side script, so I can’t decode it further. Still, it has good display options for whatever text you paste in.

EDIT: and here’s the simple Ruby styling that makes the ruby more usable for me. YMMV.

[style type=“text/css”]
rt {
font-size: 80%;
color: #cc0000;
}
[/style]

Thanks, I’ll investigate it.

I just uploaded a new version of the script. Changes (for now - might update it again tonight):

  • From now on (after you update), it will automatically update itself.
  • When you run it for the first time, you’re automatically asked to enter your WK level.
  • Basic code rewritten to use the aforementioned dictionary format. No way to edit it yet, though. Coming soon.
  • Completely rewritten highlight code: about 1.5x as fast and much more memory efficient, because same-colored kanji in a row will now be grouped into one HTML tag (to change their color)
  • The rewritten highlight code also fixes WaniKani’s buttons, which would break while the script was running (almost - the Kanji button at the top still looks weird)
http://userscripts.org/scripts/show/449147

Tested with Firefox and Chrome! I think Chrome users might have to manually remove the old “WaniKani Kanji Highlighter” script before updating.

Coming soon:
  • Ability to edit the kanji dictionaries, add custom kanji etc.
  • Dialog to edit the rendering settings (colors, whether or not to highlight known kanji, etc.)
Also, considering to add vocab support, but that’s still in the “distant” future - see discussion in thread for more info.

Liking the look of the new version, looking forward to seeing what a level up in thirty-ish hours will do for it. Already, looking at today’s Wikipedia page on Van Gogh is encouraging.

Also: Pretty enthused that you’re considering stepping up to the complexities that a vocabulary version might involve. I had exactly the same criticism and then reconsideration for Kakasi. I’ve noted instances where one or the other is an actual let down but not well enough to identify significant trends. Partly, my own grammatical ignorance affects that.

PS: Congrats on the level up. :wink:

Haha, thanks!

New update.

First of all, I reduced the number of yellow-red shades by one - this will make the script slightly faster and we don’t really need that many shades of orange, do we?
There are three new entries in the user command menu. You can now specify additional kanji that you have learned outside of WaniKani, which will be blue. However, once you learn them in a WK level, they’ll also be green (because it’s nicer to read).

Also, you can now manually edit the kanji level dictionary, which by default contains all 50 levels of WaniKani.

http://userscripts.org/scripts/show/449147

Here’s a pretty cool screenshot of the kanji level dictionary, rendered with the script



The blue ones are the TextFugu kanji, which I’ve all learned before coming here. If anybody wants them: http://pastebin.com/GYgrP354

Note for anyone who wants to edit the JSON: You can also use a map with number indices such as {“1”: “kanji…”} instead of an array.

BTW, would totally appreciate it if a JS developer could recommend some kind of UI library that I could use for option dialogs.

EDIT: Fixed a bug - the WK levels were offset by one

EDIT2: Small bugfix and new “Add additional known kanji” menu command, for convenience. It will contain the selected text in the input box, so you can basically select any kanji in a text and choose that menu item to add it as known.