となりのトトロ Cinema Manga: WaniKani Statistics

Brief background:

「となりのトトロ」 (“My Neighbor Totoro” in English releases) is a Japanese animated fantasy film by animation Studio Ghibli, and released in theaters in 1988. It is set in postwar rural Japan, and centers around two young sisters. The movie has a fairly slow pace, and dialogue is simple without specialized or technical terms.

My history:

Aside from my extreme familiarity with the title (having seen the movie probably over 40 times, mostly the original English dub), I figured some many years ago that it would make for good Japanese reading material. I found a copy of the movie’s script online years ago, and was summarily crushed by the kanji (owing to my then-meager knowledge of them).

Fast-forward to today:

Recently, I discovered that there are “cinema manga” for Studio Ghibli’s movies. A “cinema manga” is made of screenshots from a movie, with dialogue balloons and sound effects applied to it. I figured I’m on WaniKani level 7, level 8 is just around the corner, and that means I’m practically as accomplished in kanji as those level 60’s you see around here. (Any day now and I’ll catch up with them.) So, I bought a copy of 「文春ジブリ文庫 シネマコミック となりのトトロ」, and dove right into reading. And…it’s a whole lot easier than it was when I tried all those years ago. (That’s what we call progress.)

Since I’m a computer geek and something of a numbers geek and pretend to be a statistics geek, I wanted to know how much I should be able to read based on what I’ve learned in WaniKani, and on what I will be learning in WaniKani. (Reaching level 8 will mean I’m practically at the finish line and can soon read all Japanese, right?)

Rather than transcribe the whole book, I opted to downloaded Japanese subtitles for the movie. Since the manga includes all dialogue from the movie, this has been a useful shortcut to generating numbers. There is a shortcoming, however: sometimes the manga uses hiragana where the movie subtitles use kanji. Because of this, my numbers are a little off. This means the level numbers are a little inflated (should be lower), and the percentages are a little deflated (should be higher). I’ve decided this is not an issue for me, as I am interested only in rough numbers, not exact.

Kanji and WaniKani statistics:

From the subtitles, there are 317 unique kanji. Counting duplicates, there are 1170 kanji in all.

How many kanji (unique and total) should one recognized based on their level progress in WaniKani? Here’s what I came up with:

Level Unique Kanji Total Kanji
5 26% 42%
10 50% 68%
16 65% 80%
27 82% 90%
37 91% 95%
50 98% 99%

By the way, there is one kanji in the manga (that I know of) which is not included in WaniKani: 妖 (in the line, 「それは妖怪ですか?」).

Vocabulary and WaniKani statistics:

I haven’t parsed out all the words, but I did try to see how many words in the manga also appear in WaniKani. The following statistics are looking only at words taught in WaniKani which also appear in the manga.

What is included here:

  • Vocabulary regardless of conjugation. (I used mecab to convert conjugated words to their dictionary form to compare with WaniKani.)
  • Words that use kanji in the subtitles, but only hiragana in the manga.

What is not included here:

  • Words that normally have kanji, but appear as hiragana in the subitles.
  • Words that normally are written only in hiragana.
  • Any other words not covered by WaniKani.

There are 271 unique words covered by WaniKani that appear in the manga. Due to repeated usage, these words appear a total of 755 times in all. (I do not have a count of overall total words.) Keep in mind, the following numbers are only for words covered by WaniKani.

Level Unique Words Total Words
4 20% 28%
7 35% 51%
10 51% 63%
14 61% 70%
27 83% 88%
51 100% 100%

It looks like level 7 is the sweet spot to get past the feeling you’re looking up ever other word. Of course, knowing vocabulary and kanji will only get you so far. You need to know enough grammar and conjugations to get by. Between my N5 and somewhat N4 grammar knowledge, and vocabulary I’ve learned outside of WaniKani, I’ve quickly gone through the first 150 pages having only a handful of words I don’t know, and no grammer I didn’t know.

For subtitled movie-watchers:

Everything in this post should apply to the Japanese subtitled release of the movie as well. The only difference is that the subtitles also use 挨拶 (twice!), two more kanji not in WaniKani. The manga uses あいさつ for this word.

Sample pages

Here are some sample pages from this manga.

Cover

(Note that the physical releaeses are numbered. I think this one is number 3.)

Moving truck

Haunted house

Heading out to play

20 Likes

Seems like a really nice endeavour you’re setting up there.:wink::wink:

A few notes on the technical aspect of it.

How are you doing the text analysis of the subs? Are you directly comparing it to your current WK kanji and vocab? There’s sofware to do that (unnamed japanese text analysis tool), in case you are doing it some other way. You can even compare to your current vocab and get the difference calculated and made into a list :wink: .

About that, unless you’re absolutely fixed on the idea of doing it over the manga version, I think using something like Subs2SRS and Voracious could provide with even a more direct way to go through the film, with the extra advantage of having a direct way to do dictionary look ups while watching.

I’m setting up a similar proyect to go over the films of my favorite japanese director… subs were the most limiting part, but after that is all clear skies … :slightly_smiling_face:

1 Like

For subtitles, I found a text file with the subtitles. I simple removed the timing information and that left me with all the dialogue. (I also have the transcript I downloaded years ago, but opted for the subtitle file since it exists.)

I ran the subtitle file through mecab, which gave me a nice breakdown. As an example, one line of the movie is the father telling his younger daughter at the breakfast table, 「座って食べなさい」.

Putting this into mecab returns:

座っ    動詞,自立,*,*,五段・ラ行,連用タ接続,座る,スワッ,スワッ
て      助詞,接続助詞,*,*,*,*,て,テ,テ
食べ    動詞,自立,*,*,一段,連用形,食べる,タベ,タベ
なさい  動詞,非自立,*,*,五段・ラ行特殊,命令i,なさる,ナサイ,ナサイ

I then parsed out the third to last column:

  • 座る
  • 食べる
  • なさる

For WaniKani’s vocabulary, I’ve already saved off a list of a list of all vocabulary for all level. (Likewise kanji.) Finally, I ran a comparison of exact vocabulary matches between the above and the list. In this case, the matches are:

  • 6 食べる
  • 18 座る

Thus, I should be able to read this line without trouble by the time I complete level 18 (with the exception that I learned 食べる in Japanese class waaay back when, and I learned 座る through iKnow a couple of years ago.)

I was able to drop the whole subtitle file into mecab for full movie results, and then wrote some quick&dirty code in Ruby to analyze it.

Regarding the various links:

  • Sub2SRS: I did try this one out earlier this year, but the timing of the subtitle file I have doesn’t match either of my “My Neighbor Totoro” DVD’s. (I have the two English dub releases.) I’ve tried adjusting the timing on the subtitle file, but that didn’t help. (C’est la vie.) I’m sure I’ll find a time when I can make use of it one day, though!

  • Voracious: I’ve never heard of this one, but I see it has a Linux build. I don’t know if any subtitle files I’ve downloaded have timings that match up the US DVD releases of anime I have, but this program does look worth checking into.

  • unnamed japanese text analysis tool: I’ll check into this one as well. It looks like it might be written in Java, which I unfortunately always face issues with on Linux, but I’ll give it a try.

Although I wanted to do flash cards for this movie earlier this year, I’ve decided I’m not too worried about that since I’m learning new words from iKnow, WaniKani, and reading manga. If I encounter a new word reading となりのトトロ, I’ll consider whether to look it up or keep going, and if the word is common enough, I’ll either learn it in iKnow/WK or pick it up over time.

I will admit, the idea of able to compare it with my WaniKani progress makes me want to extract my iKnow progress as well, and use both to see which vocabulary I hasn’t learned yet, and see if it’s small enough of a pool that I might just create an Anki deck and get to work on it.

My unofficial goal for this may turn out to be:

  1. Read through without looking up (many) words.
  2. Read through again and look up unknown words.
  3. Watch movie in Japanese without subtitles (potentially slowed down a little in VLC) and actually know what’s being said from the Japanese rather than my practically-memorized original dub script.

Regarding point 3, I think my favorite thing about reading through this cinema manga (aside from being able to take the time to fully enjoy all the background artwork) is seeing all the little things in dialogue that changed in the dub. Some of it I knew already, and others I’m encountering for the first time.

If things go well with this, Kiki’s Delivery Service may very well be next on my list. (I dare not yet jump into something like Porco Rosso or Princess Mononoke that likely has more difficult words!)

4 Likes

The 千と千尋 (Spirited Away) cinema manga was one of the first japanese written materials I read. If you know and love the story it’s a good transition into real media. By watching the movie after reading the manga, either as a whole, or scene by scene you should be able to practice your listening skills too. If I remember right I think it was too hard when I was approximately N5 level, but when I came back to it around N4 it was much more readable and enjoyable.

That’s definitely my plan for となりのトトロ. I’ve been watching anime in Japanese since the mid 1990’s, but I’ve never watched something without subtitles and understood everything by hearing every word spoken.

If (when?) all goes well with that, likely it’ll be 魔女の宅急便 (Kiki’s Delivery Service) next, and I wouldn’t be surprised if I went with 千と千尋の神隠し (Spirited Away) third.

The best part is that Studio Ghibli has a wide range of movies I love, going from simple dialogue with younger characters to more complex dialogue with older characters, and fantasy worlds that likely have world-specific terms.

1 Like

I am way late to this party, but I wanted to ask what percentage of the total number of (unique) words the WaniKani words represent?

Oh no it’s available on kindle.

I don’t have the tools written to give a certain answer to this, but I can give some rough numbers.

If we count not only individual words, but also some particles, we’rel looking at about 650 unique words (give or take a handful). This is counting different conjugations of a word as one unique word, unless their part-of-speech is different (so 休み and 休む are counted separately). Of these, 271 were previously found to be covered by WaniKani.

With 271 of 650 covered by WaniKani, this means WaniKani covers about 41.7% of the cinema manga’s vocabulary. To compare, the first 2,000 words in iKnow cover nearly 300 words, for about 46.2%. Most of the words covered by WaniKani are probably also covered by iKnow and vice versa.

Important notice: For this, I’m comparing with a list of iKnow vocabulary with kanji and also the same words with hiragana only. However, I am using only WaniKani’s vocabulary with kanji, not hiragana only. There may be some words covered by WaniKani that appear hiranaga only in the cinema manga, and these are missed (due to insufficient tools on my part).

Remember, these are just rough numbers without having tools in place to properly analyze and get exact concrete numbers.

Here are a few words that may be not covered by either WaniKani or iKnow’s first 2,000 words (to give an example):

  • 居場所【いばしょ】[n] whereabouts; place; location
  • 競走 【きょうそう】[adj-no,vs,n] race
  • 不気味 【ぶきみ】[n,adj-na] weird; ominous; eerie
  • 勝手口 【かってぐち】[n] kitchen door; back door
  • 振り回す 【ふりまわす】[vt,v5s] to wave (about); to swing
1 Like

Interesting, thanks! That’s actually a pretty good proportion of the words compared to what I was vaguely expecting. GIven how many words there are in a language, but also given that WaniKani is totally focused on kanji and therefore isn’t the place to learn some fairly basic vocabulary written in hiragana / katakana, I’m pleasantly surprised it covers so much of the total.