ChristopherFritz's Study Log

ChristopherFritz · January 14, 2023, 2:41am

It’s a few months out, but I’m thinking of implementing auto-populating English words for the ちいさな森のオオカミちゃん book club.

Here’s my current proof of concept:

The idea is that as you enter Japanese words, it automatically populates English words:

The expectation is that you then type the most appropriate word in place of the given English words:

(So many projects, so little time.)

shuly · January 14, 2023, 2:44am

lots of time…oh wait you are doing projects and not catching up on reading
cuz I’m one to talk

ChristopherFritz · January 14, 2023, 2:51am

Thankfully, I’m caught up on book clubs not counting this upcoming week’s reading! (But for that it’s just Shadows House waiting to be read.)

shuly · January 14, 2023, 2:57am

I h8 you hahaha
I’m behind in Shadow House…but I am caught up with everything else

I should have been reading today but ended up distracted decided I’d work on sorting all the photos from the Japan trip last year (have over 8000 of them) and it’s a lot of work…though at least they are now sorted by date for each camera… so now I can start working on them in some logical order… probably should work on reading SH though a bit…cuz you’re gonna be in vol 5 and I’m not gonna be done with vol4!

ChristopherFritz · January 14, 2023, 3:05am

Once you get past the multi-chapter storyline, the pace picks up a bit. Or at least, it feels me to like chapters after that go a bit faster.

shuly · January 14, 2023, 3:08am

I’m just envious of how fast you can catch up with a book at the difficulty level of SH

It’s not like you’ve done any reading before or anything

ChristopherFritz · January 14, 2023, 3:12am

It helps when I have a system in place where I can “cheat” by quickly and effortlessly looking up every unknown word I encounter.

Kazzeon · January 14, 2023, 4:10am

Why doesn’t it auto populate the hiragana?

ChristopherFritz · January 14, 2023, 5:13am

There is a limit to how much laziness I can be party to allowing readers to become.

ChristopherFritz · January 14, 2023, 5:20am

DIO-Berry · January 14, 2023, 5:21am

That’s cool. Is that your own code or a feature?

shuly · January 14, 2023, 5:22am

you can do this but WK still can’t come up with a proper built in leech manager …

ChristopherFritz · January 14, 2023, 5:29am

I found a dictionary file extracted from Wiktionary, I cleaned it up for my own use, and I put it into a spreadsheet:

Then I added another sheet:

In cell B2, I have the formula:

=IFNA(VLOOKUP(A2,Wiktionary!A:D,2,false), "")

That populates the kana word based on the first match for the kanji word that’s found in the Wiktionary sheet.

Next, in C2 I have:

=IFNA(VLOOKUP(A2,Wiktionary!A:D,4,false), IFNA(VLOOKUP(B2,Wiktionary!A:D,4,false), ""))

This populates the English word based on the first kanji match found. If there’s no match, then it looks for the first kana match found.

However, there may be multiple definitions in the match and only one applies to the word in context, so we want to make it clear to the person filling in the word that they need to actually type in the English word. For that, I make the English word gray if it’s a formula and thus it will be black if not a formula. This is done via conditional formatting:

Another option would be to make it gray if it’s a formula and contains "2. ", so if there’s only one result then you don’t need to type the English in. Hm…

Kazzeon · January 14, 2023, 5:30am

Why doesn’t it get all the kanji and vocab from the pages automatically?

ChristopherFritz · January 14, 2023, 5:31am

Don’t think I haven’t been experimenting with that!

Edit:

(Missing page numbers, though.)

(But I’m working on that, too.)

ChristopherFritz · January 17, 2023, 12:25am

Migaku’s SRS (alpha release) launched this weekend while I wasn’t looking (even though I knew late last week that it was about to be released).

That means I’m about ready to return to the SRS grind.

Although Migaku’s kanji extension for Anki isn’t built into their SRS application yet, I’ll give it a go just doing vocabulary cards and see how it works out.

My plan of action is as follows:

Manga Selection

The first step is picking something I plan to read. Since I’m nearing the end of 夢みる太陽 volume 4, I think volume 5 would be a good candidate to start off with.

Vocabulary Selection

My extracted vocabulary list shows 創立 as my top unknown word from the volume. This is one I should expose myself to via SRS before I start reading the volume, as I will (hopefully!) recognize it when I encounter it.

Old Material Search

I want to have a sentence and a screenshot to include on the card, but I don’t want to expose myself to a potential spoiler with the upcoming volume. This is where I have a tool that lets me look up a word in manga I’ve read.

Thankfully, I have a couple of results to work with:

Card Front

The main issue for me here is that there isn’t an option to have both the vocabulary word and the sentence on the front of the card. It’s one or the other. (That’s an option I’ll have to put in a request for some time. It’s something I did custom with the Migaku card type in Anki.)

Card Back

ChristopherFritz · February 19, 2023, 8:50pm

Background of this post, from ABBC:

image700×76 5.66 KB

So, once upon a time, I wrote code to download images from Bookwalker.

Consider this item from Bookwalker:

Why did I write this?

I wanted to partake in all those fancy BookWalker sales, but I also don’t want to leave Kobo as I can remove DRM from my purchases.

Removing DRM is essential for me as:

It lets me view my purchase on any device of my choosing (Kobo doesn’t have a Linux application).
It lets me use Mokuro for OCR.
I can use Mokuro’s output for all kinds of things that have helped with my learning.

Why did I never use this?

Manga images are stored in JPG format, which is a lossy format. The file’s size is decreased by storing less data, then using an algorithm to approximate what was there when viewing the image.

However, BookWalker doesn’t show the JPG image to the user.

Instead, they load it into an HTML canvas, where the original JPG data is inaccessible.

This means that when you save the image (which BookWalker blocks doing), you have a choice:

Save the image as JPG for a smaller file size, but lose even more of the original information.
Save the image as PNG to retain the full image information, but at a larger file size.

At that point, I’d rather continue buying from Kobo and receive the “original” JPG images for my purchases.

There’s also the issue that images are saved at the canvas size, which means there is whitespace to be cropped (although this can be automated post-download):

(My download method requires viewing pages with vertical scrolling rather than horizontal, but the canvas is still oriented for a wider dimension.)

Anonymous poll time!

Do I release my code for others to use for study purposes, such as if they want to create a PDF they can mark on?

Do not release it. It’s too dangerous.
Release it. Make it available for those who will use it for good. There are other, easier ways to remove DRM if someone wanted to do something bad with it.
Doesn’t matter. The Crabigator probably won’t like it here.

0 voters

ChristopherFritz · February 21, 2023, 7:22pm

The vast majority of a small number of people said go for it, so here it is:

ChristopherFritz · February 23, 2023, 2:52am

I’m completely envious of this system:

I wonder how good their OCR is.

Gorbit99 · February 23, 2023, 3:10am

For no particular reason, huh?

I’d think that you don’t need perfect OCR as long as the AI that translates is trained well enough. And their ordering is all messed up, not very cool

Found the ご紹介 video of theirs: マンガの高速な多言語展開を可能にする『Mantra Engine』 - YouTube
It’s pretty interesting

Topic		Replies	Views
Sortasamm's Study Log Study Logs (Public)	18	315	August 25, 2024
[Study Log] zyoeru's Study Log Study Logs (Public)	21	1211	April 24, 2022
Hanazono's Study Log Study Logs (Public)	8	1125	November 16, 2021
Postliminal's Study Log :dizzy: Study Logs (Public)	22	1527	July 23, 2021
Suji's Slow Journey! A :snail: Study Log WaniKani	13	1317	January 8, 2022

ChristopherFritz's Study Log

Related Topics