When can I read [X]? - WaniKani Level Checker

Syphus said... You wrote this whole thing in Excel? 

Hmmm...I shall give this a think.

A 5 second look does see that Sheets appears to have a CHAR() function though I'm not sure that Google Docs is really any better. 
 In Excel at least, CHAR() generates a character based on its ASCII code. CODE() is the opposite, generating an ASCII code based on an input character. The comparable functions for the unicode character space are UNICHAR() and UNICODE(). The CHAR() and CODE() functions don't work for this task because the CODE() function returns the kanji's value in some kind of Japanese character space (国->118), but in reverse it seems to generate the value in the English character space (118->v). The problem with this is that if there's any roman text (like a lower case v) in the input text, it will also register as an ASCII value in the same space as the kanji (either "v" or "国" -> 118). The UNICODE() function should be able to find unique values for the different characters, but again, I don't have a version of Excel that supports it. I don't know whether Sheets does, but you should see whether the CODE() function supports unicode, or if Sheets has a function that does, since the function needs to evaluate western and Japanese characters differently.

As I mentioned in my edit, it seems they have both. But just looking for the documentation it claims it uses Unicode. https://support.google.com/docs/answer/3094120?hl=en

I just tried using them really quick since I’m using Sheets for work, and it seems to work okay. Or at the very least I didn’t see an immediate problem.

Awesome idea!

Here’s my version in ruby.

BreadstickNinja said... I looked into incorporating this, with my idea being to find the number of characters within the bounds of the character space that includes kanji, then subtract the count of kanji that are counted by the tool. However, the CODE() function only works on ASCII characters, and the UNICODE() function is only introduced in Excel 2013, and I don't have Excel 2013. :(

But one of these fancy programmer types might have a better idea of how to do it in JS or otherwise.
Does Excel support regex? You should be able to do the same thing I did in Javascript with a equivalent function testing whether a unicode character falls between the [\u4e00-\u9faf] character classes ( more info: http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml )

And it's already supported in the JS version, under the "Unknown" section.
pohlondrej said... This is because of security restrictions on Windows. I published it, now you can install it from Chrome store :

https://chrome.google.com/webstore/detail/wanikani-reading-ability/geilaheefnofbnocgibjjdeopmmipanc

I will improve the desription and add screenshots when I have more time...
You guys are all geniuses! Damn I feel like a complete idiot ...
I'm so glad that I've discovered WaniKana and its amazing community, it makes the kanji learning process way more doable!

pohlondrej thank you so much for the Chrome extension! I'm sure I'll use it everyday.
A little suggestion for the future (if you'll ever want to improve it). It would be even more awesome if you could have a WK estimate only from a selected portion of text and not the whole webpage. Maybe I'm asking too much, I don't even know if it's possible, but I thought it could be an idea.

It is definitely possible, but as I stated a few pages back, I am not a Javascript guru - I just took the existing code and made a Chrome extension from it. 

But if someone actually IS Javascript guru, have a look in the public repository on github : https://github.com/pohlondrej/WK-reading-ability-checker

Well, I’m certainly no JavaScript guru, nor do I know how to contribute through GitHub, but you can get that functionality to work if you define these functions:

function anyTextIsSelected() {
    return selectedText() !== “”;
}
function selectedText() {
    return window.getSelection().toString();
}
function allTextInDocument() {
    return document.body.innerText;

}

and change this line:
}((document.body.innerText || document.body.innerText).split(“”))).filter(function (x) { 
to this:
}((anyTextIsSelected() ? selectedText() : allTextInDocument()).split(“”))).filter(function (x) { 
When a portion of the page’s text is highlighted only that part will be tested, otherwise it will check the whole page as it does now.

Edit: Fixed formatting (I hope).

GangsterOfBoats said... Well, I'm certainly no JavaScript guru ...
 Let me crown you ! Now you're JavaScript guru !

There is a story behind calling somebody a 'guru' :

In our company, we've been developing a native android application. Our client was designing the UI layer and did the "managed stuff", we did the application logic. However, our client had to deal with something in Android they didn't know anything about, so they hired an "Android Expert". We've been laughing about it a while, because of the "we are too lazy to google the thing, let's just hire another guy who will do it for us" approach...

One day, our manager said to us that we'll need to port the application to iOS (as our client wishes). My colleague said :
"Hey, but we don't know anything about iOS development...we need someone who knows the platform, you know...some iOS Expert."
Manager took a macbook and gave it to my colleague : "Can you turn this thing on ?"
He said : "Of course I can !" and turned it on. Then manager said :
"Cool. From now, you're an iOS Guru !"

Anyways, I updated the code, tested the extension and republished it in Chrome Web Store, so now you can use it on the selection of Japanese text. Thank you, GangsterOfBoats !

It should update automatically. If it does not update, here is the URL : https://chrome.google.com/webstore/detail/wanikani-reading-ability/geilaheefnofbnocgibjjdeopmmipanc
pohlondrej said... It is definitely possible, but as I stated a few pages back, I am not a Javascript guru - I just took the existing code and made a Chrome extension from it. 

But if someone actually IS Javascript guru, have a look in the public repository on github : https://github.com/pohlondrej/WK-reading-ability-checker
 && Breadstick Ninja for the original --

Thank you so much!  I have both the excel file and the extension (aforementioned) ready to go.  I definitely think this will inspire me to keep going, as now I can look at websites or content that interests me and gauge approximately where I am.



ERROR=>FIX
I love the WK community so hard.

I was looking at Breadstick’s original file for a project of my own, found an error, come back here to report the solution, and find such wonderful implementations that I’mma use hard. I hope you guys actually see this though. Weirdly, Xyresic’s script seems just fine, which probably secures pohlondrej’s extension - so this is principally aimed at Breadstick and other users of that file.

Basically, unless my file is somehow corrupted (downloaded it again and checked on dropbox to be sure), two kanji appear twice:

32, 41
28, 46

What should appear in place of the second instances are the following omitted characters:
41
46

Hope that's correct and useful.

PS: There was interesting discussion about using UNICODE() in Excel 2013. I have Excel 2013! What might the formula be? Or, please teach me how to RegEx as it would be useful for this project, but my relevant -fu is weak.

PPS: Breadstick, this is so helpful for the corpus project I'm working on, twice over with a little customisation. Its potential is hot hot hot.

Bookmarking this for later use when I’m level 15-20 and/or feel more confident in my grammar… :smiley:

Thank you, everyone! You’re all AMAZING! すごい!

ocac said... *ERROR=>FIX*
I love the WK community so hard.

I was looking at Breadstick's original file for a project of my own, found an error, come back here to report the solution, and find such wonderful implementations that I'mma use hard. I hope you guys actually see this though. Weirdly, Xyresic's script seems just fine, which probably secures pohlondrej's extension - so this is principally aimed at Breadstick and other users of that file.

Basically, unless my file is somehow corrupted (downloaded it again and checked on dropbox to be sure), two kanji appear twice:


32, 41


28, 46

What should appear in place of the second instances are the following omitted characters:


41


46

Hope that’s correct and useful.

PS: There was interesting discussion about using UNICODE() in Excel 2013. I have Excel 2013! What might the formula be? Or, please teach me how to RegEx as it would be useful for this project, but my relevant -fu is weak.

PPS: Breadstick, this is so helpful for the corpus project I’m working on, twice over with a little customisation. Its potential is hot hot hot.

 Hey, I just saw this!!! Thank you so much for catching the error. I don’t even know how it could have got in there since I thought I built this directly from the kanji lists, but you were absolutely right. I’ve fixed the files now.

My idea surrounding the UNICODE() function would be to find the upper and lower limits of the kanji character space in unicode to identify characters that a) are kanji but b) aren’t contained in the WK kanji list.

You’d use the VLOOKUP() function to test whether all characters (including things like punctuation and English characters) are in the list of WK kanji, using the “FALSE” fourth attribute to ensure that VLOOKUP() returns an exact match. This will result in a positive match for WK kanji, but an error for both punctuation and non-WK kanji. You can use the IFERROR() function to make the cell evaluate a new formula in that error case. Then as the error result, you test whether the unicode value of the character is within the kanji character space with AND(UNICODE(cell)>=X,UNICODE(cell)<=Y), where x and y are the lower and upper kanji bounds.  This identifies that the cell is a kanji rather than punctuation or something else, but just not taught by WK. You make that return some value that gets counted in the master list as a non-WK kanji (in my projects I usually call it “61” to represent all kanji that aren’t assigned a WK level). The false error condition on the second formula returns “”, a blank cell.

The whole formula would look like

=IFERROR(VLOOKUP(cell,kanji:array,2,FALSE),VLOOKUP(cell,kanji:array,2,FALSE),IF(AND(UNICODE(cell)>=X,UNICODE(cell)<=Y,61,“”)))

where “cell” is the cell you’re testing on the left, kanji:array is the list of kanji and their levels, and X and Y are the unicode kanji space bounds described above.

AnimeCanuck said…
Bookmarking this for later use when I’m level 15-20 and/or feel more confident in my grammar… :smiley:

Thank you, everyone! You’re all AMAZING! すごい!

 Thank you!!

Just wanted to say thanks for developing this tool. I’m constantly flabbergasted by how many helpful things this community (and certain people in particular) creates.

BreadstickNinja said...
ocac said... *ERROR=>FIX*
I love the WK community so hard.

I was looking at Breadstick's original file for a project of my own, found an error, come back here to report the solution, and find such wonderful implementations that I'mma use hard. I hope you guys actually see this though. Weirdly, Xyresic's script seems just fine, which probably secures pohlondrej's extension - so this is principally aimed at Breadstick and other users of that file.

Basically, unless my file is somehow corrupted (downloaded it again and checked on dropbox to be sure), two kanji appear twice:


32, 41


28, 46

What should appear in place of the second instances are the following omitted characters:


41


46

Hope that’s correct and useful.

PS: There was interesting discussion about using UNICODE() in Excel 2013. I have Excel 2013! What might the formula be? Or, please teach me how to RegEx as it would be useful for this project, but my relevant -fu is weak.

PPS: Breadstick, this is so helpful for the corpus project I’m working on, twice over with a little customisation. Its potential is hot hot hot.

 Hey, I just saw this!!! Thank you so much for catching the error. I don’t even know how it could have got in there since I thought I built this directly from the kanji lists, but you were absolutely right. I’ve fixed the files now.

My idea surrounding the UNICODE() function would be to find the upper and lower limits of the kanji character space in unicode to identify characters that a) are kanji but b) aren’t contained in the WK kanji list.

You’d use the VLOOKUP() function to test whether all characters (including things like punctuation and English characters) are in the list of WK kanji, using the “FALSE” fourth attribute to ensure that VLOOKUP() returns an exact match. This will result in a positive match for WK kanji, but an error for both punctuation and non-WK kanji. You can use the IFERROR() function to make the cell evaluate a new formula in that error case. Then as the error result, you test whether the unicode value of the character is within the kanji character space with AND(UNICODE(cell)>=X,UNICODE(cell)<=Y), where x and y are the lower and upper kanji bounds.  This identifies that the cell is a kanji rather than punctuation or something else, but just not taught by WK. You make that return some value that gets counted in the master list as a non-WK kanji (in my projects I usually call it “61” to represent all kanji that aren’t assigned a WK level). The false error condition on the second formula returns “”, a blank cell.

The whole formula would look like

=IFERROR(VLOOKUP(cell,kanji:array,2,FALSE),VLOOKUP(cell,kanji:array,2,FALSE),IF(AND(UNICODE(cell)>=X,UNICODE(cell)<=Y,61,“”)))

where “cell” is the cell you’re testing on the left, kanji:array is the list of kanji and their levels, and X and Y are the unicode kanji space bounds described above.


AnimeCanuck said…
Bookmarking this for later use when I’m level 15-20 and/or feel more confident in my grammar… :smiley:

Thank you, everyone! You’re all AMAZING! すごい!

 Thank you!!

Hey Breadstick,
I’m glad you saw the message and managed to patch the files (I wasn’t sure of any better way to contact you).
Right now, I’m working on turning a very reliable, orthography-sensitive*, broad and general frequency corpus into a tool, one that is cross-referenced with other corpora, specialist vocabularies (social sciences, literature, etc…), old-JLPT data, and so on. I want it to hook up with parsed frequency analyses of other texts, so that we can check the readability of, say, game scripts, books, anime/drama subtitles.
I modified the bones (but not the pieces) of your WaniKani level checkler, to apply WK levels to the whole corpus of words. Initially, I just grouped all non-WK kanji in the corpus into “Level 61”, and maybe that is best. But, if there are words with non-corpus kanji in future texts, they won’t be given a level - unless it’s done manually, which is sad. This UniCode approach could offer one possible legitimate, flexible solution.
One application-oriented step I’m interested in taking is analysing your WaniKani Expansion “deck” against the corpus, to give a good idea of both: (a) word frequencies and (b) the actual orthography(/ies) principally used for each word.
If you have an email (or other contact) you’d like me to get in touch via, I’ll pass that on to you, once ready.

*Actual percentages for different orthography types, rather than the “Usually Kana” of EDICT.

ocac said...
BreadstickNinja said...
ocac said...


 You can contact me at thomyorke64 (at) gmail (dotcom). I'd love to see what you're working on!

Really great tools - thank you to all those involved in their creation!

Something like this for JLPT levels would be excellent too… Javascript melts my brain, unfortunately (>_<;)

I’ve missed this the last few times this bubbled up on the main page.  A very useful tool.  Thanks as always Bread!

I’ve added showing the max WK level of the kanji in the text:

Google Spreadsheet formula (should be the same for Excel i think):

=INDEX(MAX(FILTER(AMO2:AMO61, AMP2:AMP61 > 0)))

I just realized that i’m reviving a 4 year old thread, sorry ^^
But i thought this was neat and could use the addition, so you don’t have to check this manually.

Just as a note, I made this ages ago and several kanji have probably moved to new levels since that point in time.

As such, this tool may not be 100% accurate anymore in terms of where kanji fall in different levels.

1 Like

thanks for the note (and tool). i hope it was nostalgic seeing this :wink:
This is still in the apps list, that’s why i checked it out.