ChristopherFritz's Study Log

Word of the day: Inversion

Prioritizing Anki

I’ve long had this issue where my WaniKani reviews were taking up so much time that I didn’t have time to do many Anki reviews. Thus I haven’t been making many Anki cards.

Last week, I asked myself, what if I invert my order and do Anki reviews first?

I still start with one WaniKani review session (which I think I have at 12 items), because my daily review streak is 1,177 days, and breaking that would be too freeing devastating.

Then, I switch over to Anki for as long as I have available to review.

Recent Anki card creation stats:

  • 3/12: added 3 cards
  • 3/13: added 30 cards
  • 3/14: added 0 cards
  • 3/15: added 36 cards
  • 3/16: added 33 cards
  • 3/17: added 1 card
  • 3/18: added 0 cards
  • 3/19: added 38 cards (and counting)

Selcting Kanji

So far, it’s mostly still creating cards for kanji that I already learned in WaniKani, whether I feel I have a strong recognition of them or not. If I find I really do know the kanji well, I can always retire the kanji and vocabulary cards later.

Now that I’ve done OCR on a lot of manga series (well, seven series), I’ve loaded their kanji into a spreadsheet and sorted them by most used kanji. The majority of the most used kanji is kanji in the first 30 or so levels of WaniKani. (No wonder I seem to do well with reading and minimal vocabulary lookups these days! At least, for these specific series.)

As an example, here’s Flying Witch’s kanji list, with items I already have cards for in Anki hidden, and anything that appears fewer than 20 times hidden:

image

Once I tackle much of this list, I’ll unhide kanji that show as few as ten items across the 10 manga volumes.

Selecting Vocabulary

From this list (or one of the other manga lists), I pick a kanji to look up words for.

My goal isn’t to learn kanji. It’s to learn vocabulary. But I want to be able to recognize the kanji if an unknown word as well. To help with this, I’m aiming to make cards for the more common vocabulary that use that kanji, to improve my recognition of it.

Flying Witch has the most results, but there are matches in other series as well.

I browse for a line that looks i+1 enough for me, then bring up the image and put it all into Migaku’s browser extension, and from there it goes into Anki (creating the kanji card for me).

Then I exclude this word from my search and look for another word to create a card for:

For 捕, the resulting cards cover the words:

  • 捕まる
  • 捕る

That list does fall a bit short of WaniKani’s list of six words, in part because I still need to OCR series such as Detective Conan (which uses 逮捕). But that’s all right.

The process is way longer than for people who can simply add a word they see in reading to Anki and learn from it, but this is something that is working for me so far. The real test will be when I get past these kanji I’ve had WaniKani exposure to.

Back to adding new cards.

6 Likes

Lately, it’s been minimal reviews (over 100 waiting for me on Anki) and barely any reading, as I’ve finally made progress on how I want to organize my long-running website of manga examples of Japanese grammar.

The main issue has always been working out the categorization for everything.

Recently, I decided that rather than try to have a complex categorization, I’d have essentially no categorization. Tagging will handle everything.

So far it’s going well. I’m liking it very much.

I’m down to just 89 101 more items to convert into the new style (which I’m slacking off a bit by writing in this thread here).

The main advantage of this new layout is that it’ll be a lot easier for me to add content to the site, as I don’t have to worry about categorization or various other prior limitations.

The basic concept of the site is that I do a very simple write-up of grammar:

Then I can add examples as I happen upon them when reading manga (or if I go looking for examples):

From there, if I need a refresher on grammar, I can look at the various examples I’ve posted on the site.

Now, to get back to work on those last 89 items to convert over…

(For the curious, the site is linked to in my profile here.)

8 Likes

This is a really cool project, I’ll take a look :eyes:

3 Likes

クリフリ?? :thinking: (looked up possible bits in jisho and am none the wiser)

1 Like

You know how in Japan, Family Computer becomes Famicon, and Pocket Monster becomes Pokemon, and so on?

クリストファー・フリッツ becomes クリフリ :wink:

6 Likes

Things have been going smoothly as I continue to learn vocabulary based on my manga frequency lists.

In the past, I’ve tried adding cards for words as I’m reading, and those ones never seem to stick. Maybe I always ended up picking infrequent words made up of infrequent kanji, so I didn’t see them enough before they were suspended as leeches?

Working off of frequency lists based on what I’m reading (or have recently read) has been a much nicer experience.

Until now, I’ve been focusing on frequent kanji that also appear in WaniKani levels 1-20, because supposedly I’ve learned these ones already. (But in reality, I’d forgotten many of them after burning them.)

I’ve re-learned all level 1-20 kanji that appear frequently in the manga I read. (I’m not worried yet about the level 1-20 kanji that barely shows up.)

Next up: focusing on manga-frequent kanji from levels 21-30.

Here’s what my ARIA kanji frequency list currently looks like, based on the first five volumes of the re-release:

Screenshot_20220413_212920

My method of learning these kanji is to add cards for the various vocabulary that they appear in within the manga I’ve OCR’d. Sometimes this includes less frequent vocabulary that the kanji appears in, but so far it’s been working out.

Recently, I’ve introduced another set of frequency lists: vocabulary frequency.

Here’s my ARIA vocabulary frequency list currently looks like:

Screenshot_20220413_213652

(I actually do know a few of these, but haven’t added them to my known words list yet.)

This one’s nice because sometimes kanji I know make up a common vocabulary word I don’t know. This lets me find and learn those. It also lets me start learning frequently used non-kanji words.

I think my favorite part of this so far has been how easy it is to decide what to learn next. I can either pick from a frequency list for a specific series, or view the top items to learn across multiple series:

Screenshot_20220413_214015 Screenshot_20220413_214214

8 Likes

how do you actually figure out all this mess and make sense of things… kind of amazing :slight_smile:

I tend to just add cards but it’s frustrating because I don’t really know how/which ones are more frequent or more important and it’s just a mess :upside_down_face:

4 Likes

Right?
Christopher Fritz is super amazing!!!

This is SUCH a smart approach… the techniques should form the core of a language-learning company with Chris as CTO!

5 Likes

It’s a combination of talent stacking and standing on the shoulders of giants.

Talent stacking (things I’ve learned):

  • Regular Expressions
  • Excel / Spreadsheet Formulas
  • Software Programming (the most complex one, but only needed to improve manga OCR results)

Giants (works of others I utilize):

  • DRM-removal software (so I can work with the manga images from the e-books I bought)
  • Aforementioned machine learning-backed text location recognition software
  • Aforementioned machine learning-backed OCR software
  • Morphological analyser (Juman++) for splitting OCR output into individual vocabulary words and particles

You’ll start seeing people with more time and skill than I have wrapping extremely good OCR into their products in the next few years. I imagine Migaku’s browser extension will have OCR built-in by a year from now, and there are already browser extensions such as Copyfish.

As for mass OCRing manga, I imagine if there’s a big enough demand for it, we’ll see sites that work like koohi.cafe adding manga word lists to their sites.

I’ve considered adding manga frequency lists to my zero-traffic site, but the number of people who’d get use out of them is probably very small (and those people would likely never know that they exist). Then again, if I had a page called “よつばと! Vocabulary Frequency List”, maybe it would eventually show up in a web search or two.

8 Likes

it’s not just the many tools…it’s making sense of all that information in a useful and organized way…

even if I had a frequency list…how to sort/decide what to really choose to learn etc… that’s the challenging part of what you are doing.

3 Likes

Not sure if you’re familiar with it but this site lists 150 phonetic components, and the book it refers to has example words that contain those kanji.

There’s also an anki deck.
https://ankiweb.net/shared/info/470563167

3 Likes

Thanks for the link!

I don’t recall if I’ve seen that page, although I’ve seen mention of the book.

Thus far I’ve been finding that trying to learn the phonetics aspect hasn’t worked out for me. I don’t know why. It just doesn’t.

That said, I’m always willing to try a different approach to see how well it does (or does not) work out for me.

2 Likes

Since you are obviously a very serious student of Japanese I’m sure the author would be happy to hear your feedback. There are 3 main sections, 1 focuses on phonetic components, another on the link between kana and kanji, and the third is about visual patterns.
If you get around to reading it please share your thoughts here.
I just realised you can read about the first 30 pages in the Amazon preview:
The ebook is pretty cheap: https://www.amazon.com/Kanji-Code-Phonetic-Components-Patterns/dp/0648488608/ref=sr_1_1?dchild=1&keywords=kanji+code&qid=1630807356&sr=8-1

3 Likes

Too bad it’s not available digitally via Kobo. I don’t buy ebooks from Amazon. I can buy physical if I check out the preview and think it might be useful, but I’d need to budget it for it later this year.

2 Likes

That’s good feedback I guess!

1 Like

With everything I have in place to make it easy to find words to add to Anki, somehow I’ve managed to not add very many words to Anki lately…

image

Rather than focus my efforts on adding more cards for words to learn, I’ve spent my time putting together some spreadsheet formulas to measure my progress.

Of course, I’m at the point where these numbers will change very slowly, as I’ve learned most of the highest of the high-frequency words/kanji already. Thus the utility of these is minimal…

Vocabulary Progress:

Screenshot_20220424_201538

These numbers exclude words that appear only one or two times, to try and reduce the impact of instances of OCR errors. (I should exclude that logic from Yoru Cafe and Final Fantasy VI, as it doesn’t apply there.)

I feel the numbers for Sailormoon properly reflect how time-consuming it was for me to get through reading the series…

Yotsuba’s “overall” number is probably low due to a lot of words that appear without kanji that I haven’t added to my “known words” list yet.

Final Fantasy VI probably has a low “overall” due to some names I haven’t removed from the spreadsheet yet. Likely there are also words that appear without kanji that I haven’t added to my known words list. But overall this game just uses some different vocabulary from the manga I read.

Kanji Progress:

Screenshot_20220424_201601

These numbers also exclude kanji that appear below a threshold (the “min” column), for the same reason.

Missing from this list: Final Fantasy VI, which has the lowest percentages (61.1% and 80.0%).

7 Likes

IMHO, setting up all of these systems and analytics is real work that can benefit many. You are sacrificing study time for a greater good. ありがとうございます

TBH, The Yotsuba numbers made me laugh and give a vindicated “Aha!”, because I was dying trying to read it since I was better at recognizing kanji as words than strings of kana (particularly with alterations due to casual speech and childish mispronunciations).

3 Likes

Today, I was a bit curious, if someone wanted to specifically pre-learn x% of kanji found in よつばと! volumes 1 through 15 before they started reading the series, how many kanji would they need to learn?

It’s a silly question, of course. The series has furigana. Start reading it as soon as possible, regardless of acquired kanji, if you’re interested.

But I was still curious.

The following are the statistics I compiled. They tell how many kanji you need to learn in order to recognize x% of the overall kanji in the series.

This is considering only kanji and not vocabulary.

I did not exclude kanji used in character names, although I think it’s worthwhile to do so when learning from a frequency list. You’ll likely recognize the names in context (and this series has furigana), so no need to try learning them when you’re not encountering vocabulary words that use those kanji.

よつばとWaniKani!

  • To reach about 60%, complete level 9. This covers 310 kanji, 290 from the series.
  • To reach about 70%, complete level 12. This covers 425 kanji, 393 from the series.
  • To reach about 80%, complete level 18. This covers 628 kanji, 555 from the series.
  • To reach about 90%, complete level 30. This covers 1,023 kanji, 816 from the series.

よつばとRTK6!

Following the 6th edition of Remembering the Kanji, learning the kanji in the book’s order without skipping any, goes as follows:

  • To reach about 60%, complete 1,075 kanji, including 599 from the series.
  • To reach about 70%, complete 1,240 kanji, including 686 from the series.
  • To reach about 80%, complete 1,582 kanji, including 853 from the series.
  • To reach about 90%, complete 1,882 kanji, including 989 from the series.

If you followed RTK in order, but only learned the kanji that appear in よつばと!, you would need the latter number of kanji on each line (599 kanji for 60%, etc).

よつばとFrequency!

The advantages of WaniKani and RTK are that the kanji you learn builds up off of what you’ve previously learned. Some more common kanji contain elements matching less common kanji.

If you are learning kanji based on frequency alone, you often miss out on those less common kanji.

For example, 違 is very common in the series (tied with 姉 for 35th on the list), but you won’t see ⻌ on its own, and 韋 is unsurprisingly not to be found in the series.

That said, if you were targeting the kanji you learn based on what appears most frequently in よつばと!:

  • To reach about 60%, you need to learn 130 kanji.
  • To reach about 70%, you need to learn 196 kanji.
  • To reach about 80%, you need to learn 304 kanji.
  • To reach about 90%, you need to learn 492 kanji.

よつばとWaniKaniとFrequency!

Next I wondered, what if you started with WaniKani to help get a foundation of learning kanji, then switched over to learning kanji by frequency of use in the series?

If you complete WaniKani through level 10, you learn 348 kanji, of which 323 appear in よつばと!. If you started learning by frequency at this point, how many kanji would be left to learn?

  • To reach about 60%, you’re already there.
  • To reach about 70%, you need to learn another 13 kanji.
    • 俺 違 好 香 供 丈 夫 恵 那 帰 変 緒 待
      • 香, 恵, and 那 appear in names.
  • To reach about 80%, you need to learn another 66 kanji.
  • To reach about 90%, you need to learn another 206 kanji.

If you instead complete WaniKani through level 20, you learn 694 kanji, of which 600 appear in the series. How many kanji would be left to learn if you went by frequency?

  • To reach about 80%, you’re already there.
  • To reach about 90%, you need to learn another 40 kanji.
  • To reach about 95%, you need to learn another 139 kanji.
10 Likes

With a manga (with furigana!) up next in the Intermediate Book Club, I’m planning on giving it a try.

I never heard of Spy x Family before, so I don’t really know anything about it. The sample pages seemed simple enough, but I’m figuring if it was bumped up from Beginner Book Club, there may be some difficulty I didn’t see yet.

I wonder if it uses any more complex grammar for me to learn, or if it’s more text density and reading speed that’ll have it as an intermediate read. Maybe it’s sort of between beginner and intermediate in difficulty.

I picked up the first volume and, of course, I had to see how it looks compared with my learned kanji and vocabulary.

First up is kanji:

Screenshot_20220507_144556

I recently adjusted my percentages to factor in only kanji that show up an average of once per volume (or twice for larger volumes).

As a series runs longer, the variety of words (and thus kanji) used increases. For a series such as My Love Story at 13 volumes, those random words (kanji) need to show up at least 13 times to be counted in my percentages. But when I have only one volume, such as is the case for Spy x Family, rarely used kanji that appear in volume one then hardly ever again in the series weigh the unique percentage down a bit low. If I like the volume and keep reading, the unique percentage will steadily increase as I increase that threshold.

Side-note: I don’t use a threshold of 1 for OCR’d manga because there may be some mis-read kanji, and this helps weed those out.

Among manga I’ve read or am reading, Spy x Family has the lowest percentage of unknown kanji for me. Looks like I’ll be taking this opportunity to learn a few more kanji!

Secondly is vocabulary:

Screenshot_20220507_150147

Here, the overall percentage is similarly lower than other manga I’ve read or am reading.

This utilizes the same threshold system, where the longer a series runs the more times a word needs to appear to be counted. Words that appear a few times in volume one, but not in later volumes, are included less in the percentages over time. (That’s also why my Yotsuba percentages are much higher now, as the threshold in my prior post’s screenshot was 3 and now it’s 15 for that series.)

8 Likes

One of the main characters has a tendency of using very complex words and sentences, I posted about it here in the read every day thread

5 Likes