AI translation is downstream of OCR, so there’s a lot of room for error there. (That’s just one aspect where a human will be required, alongside the reasons the platform’s creators give.)
Ordering only impacts the human doing the translation, as the reader won’t see it, so there is no harm there.
I wonder if this software will increase the need for translators and typesetters (as the software’s goal is to allow more works to be translated).
I was wondering how they removed the text, and here I see it’s already been pre-removed before loading into the system.
There are open source tools available now that could be used to make it take a few seconds tops to remove dialogue text from a page (just need to add a UI for it), so I wonder if they have something similar.
Sure, but I’d assume you yourself can tell if a given ocr without the context of the original wouldn’t make sense at all. And I’d hope they give it the context of that page to make that even better.
Their site explicitly says that it’s for publishers. So I’d assume their idea is that you don’t need special tools to get cleaned manga, you just ask the mangaka to organize their layers properly in photoshop.
They also have a different product with a weird mission statement:
Langaku helps learners improve their English by reading Manga. With features to make learning easier such as the ability to dynamically adjust difficulty, hear text read aloud, and read in multiple languages, Langaku brings the joy of Manga to English language learners.
Gotta love learning english from translations, feedback loops are awesome
I think that would just be a waste of time. Someone will have to go through either way and fix up the english translation (just have a look at the english from the ご紹介 video, it’s very spotty). Going through the manga again just so you can get fractionally less s*** translations is probably really not worth it.
Maybe it’s some kind of cultural difference? Like Japanese don’t usually use question marks, so when they appear, they have more of a surprise effect than questioning effect?
Besides just translation issues (flys, random あ left over), if you actually try to read the text, well, it’s a bit all over the place.
2.
There is actually a paper written about this as I hoped. !Beware, instant download link ahead!here
This covers a few interesting ideas. First of all, seems like I wasn’t the first one to suggest extracting frames first and trying to guess an order from that.
I’m sure this is a heavily cherry picked, very clean example, but it seems to work quite nicely. They estimate that about 92% of the pages can be ordered properly using this technique, which is close to my 95% estimate from the other day.
It’d help me so much if Mokuro included frame detection (even if not always perfect). Still, it’s unnecessary for Mokuro’s goal, so I can understand it not being implemented.
Although, I do have my own fork of Mokuro that I could work on…
But things like implementing frame detection are probably way out of my league.
Another python tutorial (it’s a lot like python has been abused for image detection purposes, eh?), that does a similar thing, but with different ideas:
This repo contains a simple main file with the full program in it as well.
Mokuro is written in python as well, so while I don’t think it would be effortless, I do think one could modify it relatively easily.
Out of these, I do think the second option would work better, simply because angled panels are more common in manga, than in comics.
I only started learning Python it in the past few…months, has it been by now? Probably a bit longer. (I still land on StackOverflow practically whenever I need to do something in Python.)
It’s weird having the cake show up on this day because I signed up for WaniKani then didn’t use it for a few years. Thus it feels like my “start date” is when I actually started using it which was at the end of a December.