Tracking Known Vocabulary and Kanji in Manga

I’ve written in my study log about spreadsheets I use to track my progess of learning vocabulary and kanji appearing in the manga I read. They’ve reached the point where I figured I’d share it with a wider audience.

Purpose

Sites such as jpdb and Koohi do a good job at providing frequency information and word lists for novels and anime, but I’m unaware of anything for manga.

I developed these spreadsheets with two goals in mind for manga I’m reading or plan to read:

  1. Finding the most frequent kanji/vocabulary words I don’t know yet.

  2. Viewing my progress in learning kanji/vocabulary words used.

Features

  • Lists kanji/vocabulary by frequency so one can target learning the most frequent items in a series.

  • Can see the frequency of a kanji/vocabulary word across multiple series, to see if it’s frequent across multiple series or only a single series.

  • Displays the percent of known unique kanji/vocabulary from a manga series.

  • Displays the percent of known overall kanji/vocabulary from a manga series.

By maintaining a list of known kanji and vocabulary, one can see what percentage of individual and overall kanji/vocabulary they know from a manga series.

Screenshots

Screenshots of my use of these sheets.

Some things this view tells me:

  1. Since それでも is 11 volumes long, I’m viewing stats only for kanji that appear at least 11 times across all volumes. Of these, I have successfully learned 100% of these kanji. (Note: I’ve excluded character names and common shogi terms.)

    • At this point, I can lower the “Min” value, such as to 9, and set that as my new goal for the frequency of kanji to learn.
  2. I likewise have kanji for the first 11 volumes of コナン. Looking at kanji that appear at least 11 times within these volumes of the series, I’ve learned 84.3% of the unique kanji appearing this many times. When considering multiple appearances of a kanji, I should recognize 96.22% of total kanji appearances.

  3. For ふらいんぐうぃっち, I have the first 10 volumes of kanji. Of these, I have 40 more kanji to learn until I’ve learned all the kanji that appear at least 10 times across these volumes.

  4. In キョーコ, the next most frequent kanji I don’t know is 徒 (a kanji I really should know by now), which appears 28 times across the seven-volume series.

Screenshot_20220808_215910

This sheet for one series I’m reading has filtered out known kanji, as well as kanji that appear in character names.

From here, I can see the next most frequent kanji I need to learn, how frequent they are in the series, and what their level in WaniKani is. (The latter is useful for seeing which kanji I should know, even if I’m sure I’ve never seen them before in my life.)

The next most frequent word for me to learn from ARIA is 昇格. By entering this into the Word field, I see in the Word Frequency column that it appears very infrequently in other series I’m reading.

Screenshot_20220808_223813

Looking at the series I may wish to begin to read or try again at reading, the Overall column gives me a good idea of how many vocabulary look-ups I would be doing.

For example, in ぼっち I’d be looking up roughly 1-in-20 words that appear the Min number of times (as I should recognize almost 96%).

ハヤテ on the other hand, I’d likely be looking up closer to 1-in-10 words that appear the Min number of times (as I should recognize about 91%).

Limitations

There are a few technological limitations.
  • This is done completely in Google Sheets meaning there’s more user maintenance required than a fancy integrated website would require.
    • You have to manually maintain your list of known kanji/vocabulary words.
    • When a kanji/vocabulary list is updated in the source spreadsheet (such as to add a new volume), it has to be manually copied over, and it takes a few steps to transfer notes from the old list to the new one.
  • There’s no method to track whether a kanji/vocabulary word is “known” or “learning”.
    • When I add a kanji or vocabulary word to Anki, I add it to my known words list with “Anki” in the Notes column.
  • Manga include “non-content” pages such as copyright pages, which are included in these lists. Over time I hope to curate these lists, re-generating them with these pages filtered out.
  • The selection of manga is limited to what I’ve read or am reading.
    • If anyone finds these spreadsheets useful, and buys manga digitally, and can extract the images, and can run them through Mokuro, I can generate kanji/vocabulary lists based on the Mokuro files and add them to the Series sheets.

Spreadsheets

How to Use

Step-by-step instructions on copying and filling out these sheets.

The two kanji sheets are used together, and the two vocabulary sheets are used together, but the kanji and vocabulary sheets are used independently from one another. The following instructions apply equally to the kanji pair and the vocabulary pair.

Copying the Progress spreadsheet

  1. Open the Progress spreadsheet (from the list above).

  2. Save a copy to your Google Docs by selecting the “File” menu and then “Make a copy”.

Adding a series

  1. Open the Series spreadsheet (from the list above).

  2. From the Series List tab, locate a series you want to add to your Progress spreadsheet. Click on the link to be taken to the sheet for that series.

    • Note: If a series has a number after its name on the tab, that is how many volumes are included in the data. Incomplete series may have later volumes added over time. If there is no number after a series name, it means the entire series is included.
  3. Right-click on the sheet’s tab and select “Copy to” then “Existing spreadsheet”.

  4. Select your copy of the Progress spreadsheet.

    • Be sure to select the correct spreadsheet, vocabulary or kanji.
  5. On the Progress sheet, rename the copied sheet to remove “Copy of” (or 「のコピー」) from the sheet name. You can optionally rename the sheet to anything you wish.

  6. Add the series name to the Progress sheet. This must match the sheet name for the series, including the volume number if there is one.

  7. Input into the Min column the minimum number of occurrences required to include the kanji/vocabulary word in the progress stats.

    • I recommend using at least the value 2 in order to reduce the chances of misread kanji or misparsed vocabulary words being included.
    • I tend to use a 2 if the series is only one volume or the number of volumes for a series with more. For example, for a five-volume series, I put a minimum occurrences value of 5.
    • If a series has omnibus volumes, I use double the series number. For example, for a four-volume omnibus series, I put a minimum occurrences value of 8.

Using series data

  1. Filter the Known column to show only “FALSE”. Optionally filter the Dictionary column to show only “TRUE”.

    • Items where the Dictionary column shows FALSE are not included in the Progress numbers.
  2. The kanji/vocabulary word at the top of the list is the most frequent item in the series. You’ll probably want to learn this one next.

  3. Add known kanji/vocabulary words to the Known Kanji or Known Words sheet.

    • This automatically updates their status on the series tabs.
  4. If you wish to exclude a kanji/vocabulary word from the Progress sheet, add a note to the Notes column of the item you wish to exclude. Reason to do this include:

    • It’s a character or place name.
    • It appears only in author notes or other text that is not part of the story content.
    • It’s a misread kanji or misparsed vocabulary word.

Note: The kanji or vocabulary words are sorted by frequency by default. If this sorting is ever lost, you can sort the series by the Count column, Z to A, to restore displaying the most frequent words first.

Requesting a series

I can add a series by request provided you are able to complete these steps.
  1. Purchase a digital copy of the volumes from the series.

  2. Remove DRM and unzip contents.

  3. Install Mokuro and run it on the volume folder that contains the manga page images.

  4. Compress Mokuro’s output _ocr folder into a zip file.

    • Do not include the images folder!
  5. Share the zip file with me on Google Drive, or else send me a message on Discord at ChristopherFritz#5813 with a link to the zip file.

Note: I don’t know Discord very well and I’m only on it a few times a month. I’m not certain if I can actually receive messages from random people.

6 Likes

動詞 no? :thinking:

I’m sure you know 催眠 by now, right?
Just knowing the second one and the context basically gives you the first one.

Anyway, sorry to derail. This is pretty cool. :nerd_face:

2 Likes

For which part?

I do have both kanji in Anki, but I recognized 催 so many times that my next review is so far away that I forgot it. (Except that I just now reset the card, so I can get to know it again.)

Knowing context definitely does often help me recognize a word with an unknown kanji in a vocabulary word list.

But more than that, when I encounter 催眠術 when reading manga, it’s the furigana that carries me through.

2 Likes

It says you don’t know 詞.

I guess I’ve just heard it a bunch that I learned it without noticing.

I saw the kanji and was like, みん…
What do they do in that manga that has to do with that… Ah, that other girl with the coin. すいみん? No, さいみん.

2 Likes

Two likely reasons for this:

  1. I may recognize it in context (動詞), but not alone (詞).
  2. It doesn’t show up enough in manga for me to recognize it.

I’ll get it properly learned one of these days =D

(Maybe.)

3 Likes

And here I thought you knew everything…

Never meet your heroes. :pensive:

3 Likes

I thought seeing my study log would be convincing enough of how bad I can be at learning Japanese =P

4 Likes