There’s plenty of new words that I’ve added to my Anki from かがみの孤城 already - but what really gets me is when I see a word I don’t know, use the kindle dictionary, and can just read the definition in Japanese without any further lookups needed. That feels good. The only real problem is that, at least on pc, the kindle program doesn’t seem to support deconjugation - so I can’t check verbs very easily unless they’re in dictionary form. It’s annoying, but I can live with it since I’m able to copy the text into yomichan to look up the words
Its kinda funny, after reading your post yesterday, I saw 途切れる show up haha
I’m hoping to get there eventually!
I don’t have any easily accessible electronic dictionary when reading (aside from dropping a word into a search engine with 意味 next to it in the search box), which limits my interaction with Japanese definitions. But when I do look things up, the definition tends to lead me to unknown words. And sometimes looking up the meaning of a word from a definition, it gets defined using the first word I was looking up…
Come to think of it, I did install Kobo’s Japanese dictionary on my e-reader. But I only tried that on a book that has furigana, and the parser for the dictionary doesn’t know how to cope with that. If I were, for example, looking up 途切れる by pressing down on the word for a second, it would bring up a “no results for 途切” screen.
Oh yeah circular definitions can be a bit of a pain - in those cases I just take the English definition of the one I’m interested in and leave it at that. I’m still mixing English and Japanese definitions fairly often, but one thing I try to do sometimes is reading the Japanese definition even with unknown words and lookup the words in the definition. Usually I don’t go more than one or two layer deep depending on how easy those extra definition layers are
What I tend to do is keep Yomichan’s search window open while I’m reading with what I’m reading on half of the screen and the dictionary on the other. Lets me switch between the two windows easily and keeps what I was reading easily on-screen. I have a couple of JP-JP dictionaries imported into Yomichan so I’ve got those higher priority than the JP-EN stuff to try to encourage me to try the JP definitions
かがみの孤城 Day 3
Reading
Goal: 70.5% (cumulative)
Actual: 76.2%
Even though it’s only on its second alpha release, I’ve been mostly enjoying using the Migaku Reader for this:
I would have read more today (I’ve surpassed the portion I had pre-read!) except I still have manga reading to get to for the day.
SRS
An add-on issue in Anki prevented me from adding cards this morning, so I only added a couple this afternoon.
New kanji cards: 1
踏
New vocabulary cards: 2
踏み込む、逃げ惑う
I feel like my card-adding is going to start ramping up after this. After all, I can’t have an issue with adding cards a third day in a row, right?
かがみの孤城 Day 4
Reading
Goal: 78.0% (cumulative)
Actual: 85.7%
Scheduling is working well, reading in the morning. Especially since the recent time change gives me an extra hour in the morning. So far I’ve been able to get in my WaniKani and Anki reviews, complete my reading quota, and create Anki cards, within an hour. That means I have enough buffer to increase my daily reading if for any reason I’m not able to keep up a “31.5%” pace each day of the weekend.
Reading is starting to feel a little easier now. I think it’s mostly that before I was re-reading material I’d already pre-read, and now I’m in material I hadn’t read yet. Daily reading has transitioned to “I want to keep reading,” where I’m limited by time available, rather than pushing myself to keep reading.
Part of what is catching my interest right now is filling out the details that I couldn’t get listening to this portion in the audio book. When you’re listening to audio where you understand anywhere from as much as 50% to as little as 5%, there’s a lot of holes waiting to be filled in.
It’s also been interesting seeing what the manga has left out, and what it’s added.
SRS
New kanji cards: 4
仰、知、容、易
New vocabulary cards: 7
言語、仰天、口にする、表紙、ああなる、容易、閉じこもる
I haven’t been paying attention to frequency when adding words. Instead, I’ve focused on a mix of:
- Do I know all the kanji in this unknown word?
- Is it a word without kanji?
- Does the unknown kanji look simple enough to learn?
I think Migaku has the ability for me to create my own frequency list to use, but I haven’t looked into it much. I’d be nice to have frequency information for this specific book up directly in Migaku Reader
かがみの孤城 Day 5
Reading
My “amount to read each day” was off again, because I didn’t account for Migaku Reader having a “end of book” page included in the page count. (I imported week one by itself, so at the end of the week’s content came an “end of book” page in the reader.)
This means that I have actually completed week one’s reading ahead of schedule.
On one hand, I’m glad and amazed I was able to complete week one’s material within the week, let alone a little early.
On the other hand, I know what happens next from reading the first chapter of the manga and listening to the start of the audio book, and the anticipation will make it difficult to wait for week two to start.
I may get a start on week two early just so I can reduce the minimum amount I have to read over the weekend.
SRS
Thus far my daily review count in Anki seems to be going okay. (On the other hand, in WaniKani I haven’t done lessons in forever, and my Apprentice+Guru has been steadily increasing just a little over time.)
WaniKani review time: about 10 to 15 minutes. Anki review time: about 10 minutes.
New kanji cards: 0
N/A
New vocabulary cards: 2
思い切る、口止め
New card counts are a bit low because I didn’t realized how close I was to the end of week one’s reading.
Reading
Today was something of a milestone day for me.
I did a lot of reading (throughout much of Saturday), with rapidly decreasing amounts of furigana available to me.
Currently feeling like:
Morning Reading: Furigana for Unknown Kanji Only
First up was 「かがみの孤城」, which I’ve totally not been keeping up a daily log for (and haven’t been adding as many words to Anki as I should be).
I did have Migaku Reader adding furigana to unknown words, and I could rapidly look up words I’d forgotten the kanji for. But there were still many known kanji that therefore didn’t have furigana. Using Migaku Reader has really been helping me get used to reading kanji without going directly to the furigana (since it’s not there).
(I do know 太, but was too focused on reading to mark it as known.)
Mid-day Reading: Furigana Upon Lookup Only
Since that reading finally surpassed the end of chapter one of the manga, second up was reading chapter two of the manga adaptation:
No furigana here, but I did have Copyfish around to OCR the text, which I then dropped into ichi.moe to look up the meaning of the unknown words. I was surprised that I wasn’t having to look up nearly as many as I expected I would be. It did help that some of the unknown kanji were ones I’d just encountered in the book earlier.
I figured I’d ready half today, half Sunday, but I ended up reading all 46 pages in one go…
Evening Reading: No Furigana
Finally, I started playing 「ポケットモンスター ブリリアントダイヤモンド」 in Japanese.
I wish Pokémon games had a furigana option, rather than a choice between kanji or kana-only. However, somehow or other, it seems I was able to mostly get by.
It did take seeing 旅 a few times before I remembered it, and I was stuck on 楽しい after seeing it a few times, until I remembered that that kanji is for that word. I took screenshots of some unknown kanji to add to Anki, as well.
Over all, I’m quite satisfied with the reading I got in with two hours of gameplay. I remember trying out the “Let’s Go Eevee” title in Japanese in 2020, and it was a much more difficult experience, with a seemingly endless flow of unknown kanji.
Flashcards
I’ve gotten my WaniKani combined Apprentice+Guru down to 299 cards. That’s down from 330 a month ago, with no new lessons during that time. (Maybe there were a few lessons a while back that I forgot were in this time frame.)
Every day it’s the same cards over and over and over again, as the number of leeches is…well, unknown. WaniKani userscripts stopped working for me a while back, and I still haven’t gotten them working again. Thus, no leech numbers. But I feel I have to be at over 90% leeches for Apprentice+Guru right now.
Mid-December Update
WaniKani
I’ve been looking forward to my daily WaniKani reviews going down, as I haven’t been doing any lessons aside from the lower-level items recently added. And yet, in the past couple of weeks, my number of daily reviews has doubled, and my Apprentice + Guru leeches are up to 266. They’re going in the wrong direction. But on the bright side, if they stick around long enough, they’ll be eligible for hibernation if WaniKani ever implements such a feature.
Anki
I haven’t been adding Anki cards as I’d planned. I’ve just been spending too much time on WaniKani reviews to have time to add many Anki cards. If I ever lose my WaniKani review streak (1,082 days in a row doing reviews!), I’ll probably put WaniKani on vacation mode and focus on learning kanji only from material I’m reading for a bit, and see how that goes.
The good news is that 「かがみの孤城」 is getting easier to read. I’m able to get through about three or four pages in the amount of time that I used to spend on one page. This has freed up time for me to start adding to Anki this week.
I’ve been using Migaku’s kanji add-on for Anki to take a page of text from the book, paste it into the add-on, and let it generate kanji cards for any unknown kanji in that text. Then, as I do the lessons for these cards, I treat each kanji card in one of three ways:
- If I know the kanji already, I mark it as known, which removes the card from reviews.
- If I don’t know the kanji:
- If it’s a simple kanji, I also add a vocabulary card or two.
- If it’s a complex kanji, I delete it (for now) unless it appears on an existing vocabulary card.
My daily reviews are getting low enough that I should add more, but most of the recent words I’ve added I’ve been failing at.
I think the main reason for failing is because I’m adding new kanji and vocabulary from 「かがみの孤城」, and a lot of that is narrative words. I’m sure I’ll see them more over time in reading the book, but they won’t be showing up in manga or video games any time soon. Adding them to Anki and reviewing them in advance of seeing them when reading should help recognition, but so far that isn’t working for me.
I may have to shift back to using manga as a primary source for new kanji/words. That’ll give me a better chance of seeing it again, with as many manga as I read, and the amount that I read.
Reading
Reading 「かがみの孤城」 is going well. I fell a little behind when there was one of the two long weeks on the reading schedule, but I’m back on track and hope to continue to keep up with the schedule. I originally thought I might start finishing ahead of time but decided to instead start making kanji and vocabulary cards as I go, for one page per day. (I’m reading about three pages per day, based on how Migaku’s reader splits the week’s reading into pages.)
Since I track my manga volume reading completely based on the year I read the manga, I like to not be midway through any when December ends (aside from book clubs). Thus, I’m looking to finish up the following in the next three and a half weeks:
Should be doable if I stop slacking off weekends.
Video Games
I’m getting good at starting games. Still gotta work on finishing them.
Eventually I’ll get to those final parts in Skyward Sword in Japanese. And I haven’t had much time to continue Pokémon in Japanese. Well, I might have except I’ve also been playing a bit of Chrono Trigger in Japanese since I can run it on my new computer.
And somehow it seems I’m able to run Little Busters as well, although I don’t plan to play it any time soon due to all the kanji, as there’s no text hooking application I could use on Linux.
I might have to write something that screenshots part of the screen, runs it through Tesseract, and then copies the output to the clipboard for Migaku’s clipboard monitor to parse and let me know which words I don’t know and give me definitions. (Actually, that’d be useful for Chrono Trigger as well.)
I don’t know if this is helpful to you (to me it is):
I’ve recently discovered a dictionary app that works well with mangas. I read mangas in bookwalker on my ipad, so I can take screenshots, but this will work with photos from paper books, too.
The app is called Nihongo, this is what the icon looks like:
You can import the picture of your manga page and it will recognize the words:
Then you click on a word and get a translation:
And you can get the full entry for that word, including example sentences, info on how common it is (top left: unüblich = uncommon). You can create an srs deck in the app, if you want to and add the picture of your manga to your card (see top right corner) as personalized example:
Sorry for highjacking your study log. Let me know if you’d prefer me delete it. I just thought this might be helpful to you or readers of your log.
Christopher, I know how you feel about the anki cards from reading. ALL of this year, I haven’t looked at ANY of the Anki that I’ve generated from my reading! Waah! Your focused approach makes sense. 40,000 words in a Vocabulary (to be literary fluent) is just a danged lot of words!!!
I would LOVE to play Skyward Sword in Japanese!! I have barely translated any of BOTW. My days are blowing by too quickly! LOL
Maharetina, please don’t delete that note!! I have been a little bit frustrated reading manga in bookwalker, because the kanji resolution was too low for me to even hand-write it into GoogleTranslate… Thank you.
Although this won’t work for me for various reasons, it looks like a great application for those who can use it.
I do occasionally use the Copyfish browser extension, which only does OCR for image to text (then I have to copy and paste into ichi.moe or another resource).
Seeing how far various software has come alone with OCR of Japanese, I’m starting to think Migaku’s plan to add OCR into their reader extension might actually be feasible. (Not that I have any idea how far along they are with that, or if it’s just a plan right now.)
Back when I tried playing Breath of the Wild in Japanese, I could barely read anything due to all the unknown words for me. Maybe I could do a little better if I tried again these days?
After three weeks without a post, I’ve ended up with two in one day…
This one’s technical, and likely has zero interested people. (Or, if anyone’s interested, zero people in a position to actually benefit from this.)
Fake Text-Hooking on Linux
I was thinking today, “I want to set up a process where I can take screenshots as I play Chrono Trigger, and use Tesseract to convert dialogue to text files, then have MIgaku parse them for me.”
Taking Screenshots
But as it turns out, once you get past the SquareEnix logo screen and opening video on Chrono Trigger, Steam can’t take screenshots anymore.
Well, that ends that plan.
Okay, not quite.
I can still use Spectacle, a screenshot application. Press Print Screen to open it, set it to screenshot the Active Window, no window title bar/borders, and in the configuration window set it to auto-save screenshots. The result is the same as if I could save screenshots directly from Steam, except I have one extra (very lightweight) program open. From there, I can simply press Print Screen for screenshots.
Clarifying Text
The first issue that comes up is that the text has a pattern behind it.
Converting an image of letters to text works best with a solid background.
This game has a setting for changing the background of the menus/dialogue boxes, and you can also change the individual colors used in the background. The latter means it may be possible to have a single solid background color for dialogue boxes.
Wait, it seems all menu background options are not in this port of the game…
My best solution is to feed the screenshot through ImageMagick’s command-line “convert” tool. I needed to do this anyway to invert the image’s colors (so the text is black rather than white), as well as for cropping, so adding a threshold option into this works out.
convert input.png -channel RGB -negate -threshold 35% output.png
Since the textbox can be placed at the top of the screen or the bottom, I settled for having convert run twice, one cropped to the upper textbox and one to the lower, resulting in two images. A better solution would be to check for pixel colors in the two potential textbox locations and crop only the one with dialogue, but what I have is fine for now.
Image to Text
There are two Japanese trained datasets for Tesseract: “jpn” and “Japanese”. I find that “jpn” sometimes double-reads a character adding rendaku, such as seeing か and parsing it as かが. But “Japanese” puts spaces between most characters. I decided I’d rather have the more accurate output, and then I’ll just strip spaces from the output in a later step.
tesseract output.png output -l Japanese
Both “jpn” and “Japanese” misparsed the に, and I haven’t looked into how to train my own dataset (I have to retain my lazy image somehow). Other than that, the result is pretty good!
Parsing Text
Next is Migaku.
By stripping spaces from the text and then copying it to the clipboard, it shows up on Migaku Clipboard. As it auto-parses, I can see at a glance furigana on unknown kanji, as well as my learn status for each word:
Looking up meanings of words is also quick and easy here:
Adding to Anki
From there, I can add to Anki with a few clicks. This includes having it auto-create kanji cards, although I don’t have that set up yet.
Streamlining
I put the following into a text file:
inotifywait -m /home/chris/Images/ -e close_write |
while read dir action file; do
mv $dir$file .
convert $file -crop 760x124+20+8 -channel RGB -negate -threshold 35% $file.top.png
tesseract $file.top.png $file.top -l Japanese
convert $file -crop 760x124+20+468 -channel RGB -negate -threshold 35% $file.bottom.png
tesseract $file.bottom.png $file.bottom -l Japanese
cat $file.top.txt $file.bottom.txt | tr -d "[:space:]" | xclip -sel c
done
By running this as a shell script in Bash, the script:
- Watches for when a new screenshot is saved (thanks to inotifywait).
- Extracts the dialogue areas of the screenshot and modifies them for OCR.
- Runs Tesseract on the dialogue portions.
- Strips spaces from the OCR output, and copies to clipboard for Migaku Clipboard to see.
Next up: Looking into font mods for Chrono Trigger, so I get something like what shows up in おんぷ先生’s Let’s Play of the game. That should give even better OCR results.
Edit: Looks like it’s the same font in the stream as I have in the game, just looking smoothed a bit in the video. No alternatives for Japanese font.
Ok, you’re right that I can’t benefit, but I’m very regularly amazed with the tools and solutions you come up with, so that makes it still worth the read. Even if it just makes me feel like a caveman when I’m doing stuff like manually banging radicals into an online search.
Hello from fellow intrigued cave person
Maybe you should wait until next year. I played through the first section of a new game, and I had to look up a bunch of stuff. But I looked around on the internet and didn’t see any decks. So I decided to make a sentence/vocabulary spreadsheet. I have taken loads of photos, but haven’t captured that into machine-readable sentences, yet.
Anyway… Once I get that started (I’ll target by February), then you can look through it and decide how you feel about it. OK? Because I’m gonna do it, anyway. IlI okay on a switch through my TV and then take pictures on my phone (so I can zoom and read). It’s a kludgy way to do it that uses a lot of time and electrons (aka bytes of storage)… But all of the temples say essentially the same thing, all of the mushrooms, etc. I have at least a hundred photos… But my Google Drive is full right now, so…
P.S. (slightly insane laughter) I, too, have been approached things 100% cave man style. I guess we are the clan 氏 of the Wani鰐 Kani蟹 BWA ha ha笑)
Not screenshots taken through the Switch?
Since I don’t run Windows, it used to be a nightmare to get screenshots from the Switch to my computer, but since their USB update a while back it’s as easy as plugging the Switch into the computer and I can access the screenshots.
Looking at the game’s dialogue, I see OCR would be out of the question, but I imagine extracting the game’s text could be doable with the right tools at hand.
Hmmm, yeah, I forgot about those Switch screen shots… That might fill up the switch memory card fast. I’m thinking that somehow people are making Twitch and YouTube videos of these games (perhaps through WiiU ??) so that might be more amenable to digital capture… I just haven’t got the bandwidth to figure it out… (Even thought it might save time in the long run). sigh
Wii U or Switch with HDMI cable split into a video capture card.
My favorite thing about YouTube Let’s Play videos by Japanese players is the ones where:
- The player reads all in-game dialogue (not counting already-voiced dialogue).
- Auto-generated subtitles by YouTube are enabled and fairly good.
- Migaku’s MPV plugin (or browser extension).
This allows for creating an Anki flashcard with one or two clicks that includes a screenshot from the video on YouTube, as well as the audio of the player reading the line, a dictionary definition of the word you want to learn, and optionally an image that represents the word.