ChristopherFritz's Study Log

AI translation is downstream of OCR, so there’s a lot of room for error there. (That’s just one aspect where a human will be required, alongside the reasons the platform’s creators give.)

Ordering only impacts the human doing the translation, as the reader won’t see it, so there is no harm there.

I wonder if this software will increase the need for translators and typesetters (as the software’s goal is to allow more works to be translated).

3 Likes

I was wondering how they removed the text, and here I see it’s already been pre-removed before loading into the system.

There are open source tools available now that could be used to make it take a few seconds tops to remove dialogue text from a page (just need to add a UI for it), so I wonder if they have something similar.

3 Likes

Sure, but I’d assume you yourself can tell if a given ocr without the context of the original wouldn’t make sense at all. And I’d hope they give it the context of that page to make that even better.

Their site explicitly says that it’s for publishers. So I’d assume their idea is that you don’t need special tools to get cleaned manga, you just ask the mangaka to organize their layers properly in photoshop.

They also have a different product with a weird mission statement:

Langaku helps learners improve their English by reading Manga. With features to make learning easier such as the ability to dynamically adjust difficulty, hear text read aloud, and read in multiple languages, Langaku brings the joy of Manga to English language learners.

Gotta love learning english from translations, feedback loops are awesome

Edit: Oh, I scrolled down, this is interesting:


So the answer to your question is “how accurate is the ocr” is “yes”

4 Likes

Ah, yes, I wasn’t considering that. Lucky publishers.

It’d be great if they could also take in the script somehow and match that up to the balloons to ensure there are no OCR errors.

But even then, they probably aren’t OCR’ing tiny print like what those of us buying digital manga often have to suffer through…

3 Likes

I think that would just be a waste of time. Someone will have to go through either way and fix up the english translation (just have a look at the english from the ご紹介 video, it’s very spotty). Going through the manga again just so you can get fractionally less s*** translations is probably really not worth it.

1 Like

I’m still confused about why the ai changed ? to !? but the Line translator does that too…

1 Like

Maybe it’s some kind of cultural difference? Like Japanese don’t usually use question marks, so when they appear, they have more of a surprise effect than questioning effect?

1 Like

I think I’d need to see the next page, but they don’t look particularly surprised to me tbh

1 Like

It’s thankfully free (and I’ve found myself a nice old dataset): open-mantra-dataset/images/balloon_dream/ja at main · mantra-inc/open-mantra-dataset · GitHub

So here are your two pages: one two

More interesting observations:

1.

Good tools does not a good translation make. This is expertly shown in one of their twitter posts:

Here be dragons



Besides just translation issues (flys, random あ left over), if you actually try to read the text, well, it’s a bit all over the place.

2.

There is actually a paper written about this as I hoped. !Beware, instant download link ahead! here

This covers a few interesting ideas. First of all, seems like I wasn’t the first one to suggest extracting frames first and trying to guess an order from that.

frame stuff

I’m sure this is a heavily cherry picked, very clean example, but it seems to work quite nicely. They estimate that about 92% of the pages can be ordered properly using this technique, which is close to my 95% estimate from the other day.

4 Likes

It’d help me so much if Mokuro included frame detection (even if not always perfect). Still, it’s unnecessary for Mokuro’s goal, so I can understand it not being implemented.

Although, I do have my own fork of Mokuro that I could work on…

But things like implementing frame detection are probably way out of my league.

2 Likes

There are plenty of articles and tools out there btw, I’m doing research right now.

This isn’t directly manga, but the idea is similar.

The algorithm itself seems pretty simple, but it’s probably summed up with a meme I’ve seen yesterday:

The meme I've seen yesterday

Another python tutorial (it’s a lot like python has been abused for image detection purposes, eh?), that does a similar thing, but with different ideas:

This repo contains a simple main file with the full program in it as well.

Mokuro is written in python as well, so while I don’t think it would be effortless, I do think one could modify it relatively easily.

Out of these, I do think the second option would work better, simply because angled panels are more common in manga, than in comics.

3 Likes
There are two kinds of manga


btw, sorry for flooding your study log

3 Likes

From the second project:

This is a simple Python program using OpenCV to detect frames in a Manga page.

I’ve never once gotten OpenCV installed and working in any fashion… =(

Well, I may have to try again sometime.

I probably won’t give it a go, though. Too many other projects.

Still, I did run the idea by my computer, of re-running all my digital manga through an updated Mokuro, to which my computer responded:

image

4 Likes

Might fork mokuro myself and give it a try though, I’ll admit it, strange as it sounds in 2023, I don’t speak snek

2 Likes

I only started learning Python it in the past few…months, has it been by now? Probably a bit longer. (I still land on StackOverflow practically whenever I need to do something in Python.)

3 Likes

Oh, I stand corrected. I’ve had it working and used it pre-Mokuro to extract text from manga pages to run OCR on.

Example output from using OpenCV

Well bother, I may have another project to work on.

6 Likes

I don’t know if it is a thing here but… Happy Cake Day! :cherry_blossom:
Thanks for all the explanations, guiding the beginners, and managing the book clubs.

9 Likes

Happy Cake Day! :cake: :tada:

Your manga snippets are the best. :star_struck:

7 Likes

Happy cake day!

6 Likes

It’s weird having the cake show up on this day because I signed up for WaniKani then didn’t use it for a few years. Thus it feels like my “start date” is when I actually started using it which was at the end of a December.

11 Likes