ChristopherFritz's Study Log

ChristopherFritz · February 23, 2023, 3:16am

AI translation is downstream of OCR, so there’s a lot of room for error there. (That’s just one aspect where a human will be required, alongside the reasons the platform’s creators give.)

Ordering only impacts the human doing the translation, as the reader won’t see it, so there is no harm there.

I wonder if this software will increase the need for translators and typesetters (as the software’s goal is to allow more works to be translated).

ChristopherFritz · February 23, 2023, 3:18am

I was wondering how they removed the text, and here I see it’s already been pre-removed before loading into the system.

There are open source tools available now that could be used to make it take a few seconds tops to remove dialogue text from a page (just need to add a UI for it), so I wonder if they have something similar.

Gorbit99 · February 23, 2023, 3:21am

Sure, but I’d assume you yourself can tell if a given ocr without the context of the original wouldn’t make sense at all. And I’d hope they give it the context of that page to make that even better.

Their site explicitly says that it’s for publishers. So I’d assume their idea is that you don’t need special tools to get cleaned manga, you just ask the mangaka to organize their layers properly in photoshop.

They also have a different product with a weird mission statement:

Langaku helps learners improve their English by reading Manga. With features to make learning easier such as the ability to dynamically adjust difficulty, hear text read aloud, and read in multiple languages, Langaku brings the joy of Manga to English language learners.

Gotta love learning english from translations, feedback loops are awesome

Edit: Oh, I scrolled down, this is interesting:

So the answer to your question is “how accurate is the ocr” is “yes”

ChristopherFritz · February 23, 2023, 3:25am

Ah, yes, I wasn’t considering that. Lucky publishers.

It’d be great if they could also take in the script somehow and match that up to the balloons to ensure there are no OCR errors.

But even then, they probably aren’t OCR’ing tiny print like what those of us buying digital manga often have to suffer through…

Gorbit99 · February 23, 2023, 3:27am

I think that would just be a waste of time. Someone will have to go through either way and fix up the english translation (just have a look at the english from the ご紹介 video, it’s very spotty). Going through the manga again just so you can get fractionally less s*** translations is probably really not worth it.

DIO-Berry · February 23, 2023, 4:46am

I’m still confused about why the ai changed ? to !? but the Line translator does that too…

Gorbit99 · February 23, 2023, 4:57am

Maybe it’s some kind of cultural difference? Like Japanese don’t usually use question marks, so when they appear, they have more of a surprise effect than questioning effect?

DIO-Berry · February 23, 2023, 5:01am

I think I’d need to see the next page, but they don’t look particularly surprised to me tbh

Gorbit99 · February 23, 2023, 5:22am

It’s thankfully free (and I’ve found myself a nice old dataset): open-mantra-dataset/images/balloon_dream/ja at main · mantra-inc/open-mantra-dataset · GitHub

So here are your two pages: one two

More interesting observations:

1.

Good tools does not a good translation make. This is expertly shown in one of their twitter posts:

Here be dragons

Besides just translation issues (flys, random あ left over), if you actually try to read the text, well, it’s a bit all over the place.

2.

There is actually a paper written about this as I hoped. !Beware, instant download link ahead! here

This covers a few interesting ideas. First of all, seems like I wasn’t the first one to suggest extracting frames first and trying to guess an order from that.

frame stuff

I’m sure this is a heavily cherry picked, very clean example, but it seems to work quite nicely. They estimate that about 92% of the pages can be ordered properly using this technique, which is close to my 95% estimate from the other day.

ChristopherFritz · February 23, 2023, 5:44am

It’d help me so much if Mokuro included frame detection (even if not always perfect). Still, it’s unnecessary for Mokuro’s goal, so I can understand it not being implemented.

Although, I do have my own fork of Mokuro that I could work on…

But things like implementing frame detection are probably way out of my league.

Gorbit99 · February 23, 2023, 5:49am

There are plenty of articles and tools out there btw, I’m doing research right now.

This isn’t directly manga, but the idea is similar.

The algorithm itself seems pretty simple, but it’s probably summed up with a meme I’ve seen yesterday:

The meme I've seen yesterday

Another python tutorial (it’s a lot like python has been abused for image detection purposes, eh?), that does a similar thing, but with different ideas:

github.com

huytd/manga-frame-detect-opencv/blob/master/Manga.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Manga frame detect with OpenCV\n",
    "\n",
    "In this tutorial, we will use two features of OpenCV: Contours Finding and Convex Hull to recognize the frames of a manga page.\n",
    "\n",
    "First, we need to import the neccessary libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],

This file has been truncated. show original

This repo contains a simple main file with the full program in it as well.

Mokuro is written in python as well, so while I don’t think it would be effortless, I do think one could modify it relatively easily.

Out of these, I do think the second option would work better, simply because angled panels are more common in manga, than in comics.

Gorbit99 · February 23, 2023, 5:55am

There are two kinds of manga

btw, sorry for flooding your study log

ChristopherFritz · February 23, 2023, 5:59am

From the second project:

This is a simple Python program using OpenCV to detect frames in a Manga page.

I’ve never once gotten OpenCV installed and working in any fashion… =(

Well, I may have to try again sometime.

I probably won’t give it a go, though. Too many other projects.

Still, I did run the idea by my computer, of re-running all my digital manga through an updated Mokuro, to which my computer responded:

Gorbit99 · February 23, 2023, 6:01am

Might fork mokuro myself and give it a try though, I’ll admit it, strange as it sounds in 2023, I don’t speak snek

ChristopherFritz · February 23, 2023, 6:03am

I only started learning Python it in the past few…months, has it been by now? Probably a bit longer. (I still land on StackOverflow practically whenever I need to do something in Python.)

ChristopherFritz · February 23, 2023, 6:12am

Oh, I stand corrected. I’ve had it working and used it pre-Mokuro to extract text from manga pages to run OCR on.

Example output from using OpenCV

Well bother, I may have another project to work on.

heikimi · March 8, 2023, 11:17am

I don’t know if it is a thing here but… Happy Cake Day!
Thanks for all the explanations, guiding the beginners, and managing the book clubs.

MissDagger · March 8, 2023, 11:26am

Happy Cake Day!

Your manga snippets are the best.

DIO-Berry · March 8, 2023, 12:16pm

Happy cake day!

ChristopherFritz · March 8, 2023, 7:16pm

It’s weird having the cake show up on this day because I signed up for WaniKani then didn’t use it for a few years. Thus it feels like my “start date” is when I actually started using it which was at the end of a December.

Topic		Replies	Views
Sortasamm's Study Log Study Logs (Public)	18	315	August 25, 2024
[Study Log] zyoeru's Study Log Study Logs (Public)	21	1211	April 24, 2022
Hanazono's Study Log Study Logs (Public)	8	1125	November 16, 2021
Postliminal's Study Log :dizzy: Study Logs (Public)	22	1527	July 23, 2021
Suji's Slow Journey! A :snail: Study Log WaniKani	13	1317	January 8, 2022

ChristopherFritz's Study Log

1.

2.

Related Topics