ChristopherFritz's Study Log

It’s a few months out, but I’m thinking of implementing auto-populating English words for the ちいさな森のオオカミちゃん book club.

Here’s my current proof of concept:

The idea is that as you enter Japanese words, it automatically populates English words:

image

The expectation is that you then type the most appropriate word in place of the given English words:

image

(So many projects, so little time.)

6 Likes

lots of time…oh wait you are doing projects and not catching up on reading :wink:
cuz I’m one to talk

3 Likes

Thankfully, I’m caught up on book clubs not counting this upcoming week’s reading! (But for that it’s just Shadows House waiting to be read.)

3 Likes

I h8 you :stuck_out_tongue: hahaha
I’m behind in Shadow House…but I am caught up with everything else :smiley:

I should have been reading today but ended up distracted decided I’d work on sorting all the photos from the Japan trip last year (have over 8000 of them) and it’s a lot of work…though at least they are now sorted by date for each camera… so now I can start working on them in some logical order… probably should work on reading SH though a bit…cuz you’re gonna be in vol 5 and I’m not gonna be done with vol4!

4 Likes

Once you get past the multi-chapter storyline, the pace picks up a bit. Or at least, it feels me to like chapters after that go a bit faster.

3 Likes

I’m just envious of how fast you can catch up with a book at the difficulty level of SH :slight_smile:

It’s not like you’ve done any reading before or anything :smiley:

4 Likes

It helps when I have a system in place where I can “cheat” by quickly and effortlessly looking up every unknown word I encounter.

image

3 Likes

Why doesn’t it auto populate the hiragana? :eyes:

2 Likes

There is a limit to how much laziness I can be party to allowing readers to become.

5 Likes
7 Likes

That’s cool. Is that your own code or a feature?

3 Likes

you can do this but WK still can’t come up with a proper built in leech manager …

3 Likes

I found a dictionary file extracted from Wiktionary, I cleaned it up for my own use, and I put it into a spreadsheet:

Then I added another sheet:

image

In cell B2, I have the formula:

=IFNA(VLOOKUP(A2,Wiktionary!A:D,2,false), "")

That populates the kana word based on the first match for the kanji word that’s found in the Wiktionary sheet.

Next, in C2 I have:

=IFNA(VLOOKUP(A2,Wiktionary!A:D,4,false), IFNA(VLOOKUP(B2,Wiktionary!A:D,4,false), ""))

This populates the English word based on the first kanji match found. If there’s no match, then it looks for the first kana match found.

However, there may be multiple definitions in the match and only one applies to the word in context, so we want to make it clear to the person filling in the word that they need to actually type in the English word. For that, I make the English word gray if it’s a formula and thus it will be black if not a formula. This is done via conditional formatting:

image

Another option would be to make it gray if it’s a formula and contains "2. ", so if there’s only one result then you don’t need to type the English in. Hm…

6 Likes

Why doesn’t it get all the kanji and vocab from the pages automatically? :eyes:

3 Likes

Don’t think I haven’t been experimenting with that!

Edit:

(Missing page numbers, though.)

(But I’m working on that, too.)

7 Likes

Migaku’s SRS (alpha release) launched this weekend while I wasn’t looking (even though I knew late last week that it was about to be released).

That means I’m about ready to return to the SRS grind.

Although Migaku’s kanji extension for Anki isn’t built into their SRS application yet, I’ll give it a go just doing vocabulary cards and see how it works out.

My plan of action is as follows:

Manga Selection

The first step is picking something I plan to read. Since I’m nearing the end of 夢みる太陽 volume 4, I think volume 5 would be a good candidate to start off with.

Vocabulary Selection

My extracted vocabulary list shows 創立 as my top unknown word from the volume. This is one I should expose myself to via SRS before I start reading the volume, as I will (hopefully!) recognize it when I encounter it.

Old Material Search

I want to have a sentence and a screenshot to include on the card, but I don’t want to expose myself to a potential spoiler with the upcoming volume. This is where I have a tool that lets me look up a word in manga I’ve read.

Thankfully, I have a couple of results to work with:

image

Card Front

The main issue for me here is that there isn’t an option to have both the vocabulary word and the sentence on the front of the card. It’s one or the other. (That’s an option I’ll have to put in a request for some time. It’s something I did custom with the Migaku card type in Anki.)

Card Back

7 Likes

Background of this post, from ABBC:

image


So, once upon a time, I wrote code to download images from Bookwalker.

Consider this item from Bookwalker:

image


Why did I write this?

I wanted to partake in all those fancy BookWalker sales, but I also don’t want to leave Kobo as I can remove DRM from my purchases.

Removing DRM is essential for me as:

  1. It lets me view my purchase on any device of my choosing (Kobo doesn’t have a Linux application).
  2. It lets me use Mokuro for OCR.
  3. I can use Mokuro’s output for all kinds of things that have helped with my learning.

Why did I never use this?

Manga images are stored in JPG format, which is a lossy format. The file’s size is decreased by storing less data, then using an algorithm to approximate what was there when viewing the image.

However, BookWalker doesn’t show the JPG image to the user.

Instead, they load it into an HTML canvas, where the original JPG data is inaccessible.

This means that when you save the image (which BookWalker blocks doing), you have a choice:

  • Save the image as JPG for a smaller file size, but lose even more of the original information.
  • Save the image as PNG to retain the full image information, but at a larger file size.

At that point, I’d rather continue buying from Kobo and receive the “original” JPG images for my purchases.

There’s also the issue that images are saved at the canvas size, which means there is whitespace to be cropped (although this can be automated post-download):

image

(My download method requires viewing pages with vertical scrolling rather than horizontal, but the canvas is still oriented for a wider dimension.)


Anonymous poll time!

Do I release my code for others to use for study purposes, such as if they want to create a PDF they can mark on?

  • Do not release it. It’s too dangerous.
  • Release it. Make it available for those who will use it for good. There are other, easier ways to remove DRM if someone wanted to do something bad with it.
  • Doesn’t matter. The Crabigator probably won’t like it here.
0 voters
5 Likes

The vast majority of a small number of people said go for it, so here it is:

7 Likes

I’m completely envious of this system:

I wonder how good their OCR is.

7 Likes

For no particular reason, huh?

I’d think that you don’t need perfect OCR as long as the AI that translates is trained well enough. And their ordering is all messed up, not very cool

Found the ご紹介 video of theirs: マンガの高速な多言語展開を可能にする『Mantra Engine』 - YouTube
It’s pretty interesting

3 Likes