When can I read [X]? - WaniKani Level Checker

Just wanted to say thanks for developing this tool. I’m constantly flabbergasted by how many helpful things this community (and certain people in particular) creates.

BreadstickNinja said...
ocac said... *ERROR=>FIX*
I love the WK community so hard.

I was looking at Breadstick's original file for a project of my own, found an error, come back here to report the solution, and find such wonderful implementations that I'mma use hard. I hope you guys actually see this though. Weirdly, Xyresic's script seems just fine, which probably secures pohlondrej's extension - so this is principally aimed at Breadstick and other users of that file.

Basically, unless my file is somehow corrupted (downloaded it again and checked on dropbox to be sure), two kanji appear twice:


32, 41


28, 46

What should appear in place of the second instances are the following omitted characters:


41


46

Hope that’s correct and useful.

PS: There was interesting discussion about using UNICODE() in Excel 2013. I have Excel 2013! What might the formula be? Or, please teach me how to RegEx as it would be useful for this project, but my relevant -fu is weak.

PPS: Breadstick, this is so helpful for the corpus project I’m working on, twice over with a little customisation. Its potential is hot hot hot.

 Hey, I just saw this!!! Thank you so much for catching the error. I don’t even know how it could have got in there since I thought I built this directly from the kanji lists, but you were absolutely right. I’ve fixed the files now.

My idea surrounding the UNICODE() function would be to find the upper and lower limits of the kanji character space in unicode to identify characters that a) are kanji but b) aren’t contained in the WK kanji list.

You’d use the VLOOKUP() function to test whether all characters (including things like punctuation and English characters) are in the list of WK kanji, using the “FALSE” fourth attribute to ensure that VLOOKUP() returns an exact match. This will result in a positive match for WK kanji, but an error for both punctuation and non-WK kanji. You can use the IFERROR() function to make the cell evaluate a new formula in that error case. Then as the error result, you test whether the unicode value of the character is within the kanji character space with AND(UNICODE(cell)>=X,UNICODE(cell)<=Y), where x and y are the lower and upper kanji bounds.  This identifies that the cell is a kanji rather than punctuation or something else, but just not taught by WK. You make that return some value that gets counted in the master list as a non-WK kanji (in my projects I usually call it “61” to represent all kanji that aren’t assigned a WK level). The false error condition on the second formula returns “”, a blank cell.

The whole formula would look like

=IFERROR(VLOOKUP(cell,kanji:array,2,FALSE),VLOOKUP(cell,kanji:array,2,FALSE),IF(AND(UNICODE(cell)>=X,UNICODE(cell)<=Y,61,“”)))

where “cell” is the cell you’re testing on the left, kanji:array is the list of kanji and their levels, and X and Y are the unicode kanji space bounds described above.


AnimeCanuck said…
Bookmarking this for later use when I’m level 15-20 and/or feel more confident in my grammar… :smiley:

Thank you, everyone! You’re all AMAZING! すごい!

 Thank you!!

Hey Breadstick,
I’m glad you saw the message and managed to patch the files (I wasn’t sure of any better way to contact you).
Right now, I’m working on turning a very reliable, orthography-sensitive*, broad and general frequency corpus into a tool, one that is cross-referenced with other corpora, specialist vocabularies (social sciences, literature, etc…), old-JLPT data, and so on. I want it to hook up with parsed frequency analyses of other texts, so that we can check the readability of, say, game scripts, books, anime/drama subtitles.
I modified the bones (but not the pieces) of your WaniKani level checkler, to apply WK levels to the whole corpus of words. Initially, I just grouped all non-WK kanji in the corpus into “Level 61”, and maybe that is best. But, if there are words with non-corpus kanji in future texts, they won’t be given a level - unless it’s done manually, which is sad. This UniCode approach could offer one possible legitimate, flexible solution.
One application-oriented step I’m interested in taking is analysing your WaniKani Expansion “deck” against the corpus, to give a good idea of both: (a) word frequencies and (b) the actual orthography(/ies) principally used for each word.
If you have an email (or other contact) you’d like me to get in touch via, I’ll pass that on to you, once ready.

*Actual percentages for different orthography types, rather than the “Usually Kana” of EDICT.

ocac said...
BreadstickNinja said...
ocac said...


 You can contact me at thomyorke64 (at) gmail (dotcom). I'd love to see what you're working on!

Really great tools - thank you to all those involved in their creation!

Something like this for JLPT levels would be excellent too… Javascript melts my brain, unfortunately (>_<;)

I’ve missed this the last few times this bubbled up on the main page.  A very useful tool.  Thanks as always Bread!

I’ve added showing the max WK level of the kanji in the text:

Google Spreadsheet formula (should be the same for Excel i think):

=INDEX(MAX(FILTER(AMO2:AMO61, AMP2:AMP61 > 0)))

I just realized that i’m reviving a 4 year old thread, sorry ^^
But i thought this was neat and could use the addition, so you don’t have to check this manually.

Just as a note, I made this ages ago and several kanji have probably moved to new levels since that point in time.

As such, this tool may not be 100% accurate anymore in terms of where kanji fall in different levels.

1 Like

thanks for the note (and tool). i hope it was nostalgic seeing this :wink:
This is still in the apps list, that’s why i checked it out.