Mokuro: Read Japanese manga with selectable text inside a browser

@ChristopherFritz so did a single test page of running mokuro on a shosetu and while it worked


it sort of makes a mess of things so js script or not for horizontal seems like a nonstarter
can’t really mouse over if all the text shifts to the right :rofl:

but mokuro seems to be working so have lots of manga to convert over (and need to install yomitan on my tablet too)

Yeah, I didn’t have time to mention it this morning, but Mokuro is absolutely manga-centric. It’s possible to use it for something like an image of novel text, but it requires a lot of extra steps, like slicing the image into columns, running it through Mokuro, then putting it all back together…tons of work without code automating it that one would have to write.

yeah - if you want a side project :wink:
for now it’s alright
have plenty to read anyway
need to stop ordering books

the screen shot sim with bookwalker and brave browser
so it worked fine until it didn’t

Wanted to share this info in case anyone else has this issue. Normally it wouldn’t have been an issue but this a new laptop and haven’t setup the system wide backups, only the key docs are being backed up. This is on my to do list in another week or so when the last of stuff is installed and everything is setup how I want.

In the meantime.....

But in any case, the script worked really well and then the whole browser shut down all windows while it was running. Logs weren’t much help. After much searching, renaming files etc… no matter what I tried reopen the browser restore the windows and it would almost immediately crash closed. Guess this has happened back in 2022 with some windows update but right now they are all paused so that wasn’t it.

The only solution seemed to be the nuclear one (already tried reinstalling) and everything else. Was able to restore the bookmarks but basically had just installed yomitan and dictionaries all had to be blitzed.

If in windows, renaming the C:\Users*(your user folder)*\AppData\Local\BraveSoftware\Brave-Browser\User Data and letting Brave create a new one fixed the problem but of course back to square one with all the browser settings, plug ins, tabs ugh!!! Hadn’t even been 24 hours since installing yomitan and dictionaries :face_with_spiral_eyes:

So before doing anymore screen caps, I reset everything and then got all the tabs where I wanted them and essentially force closed Brave to make sure it “remembered” everything and reopened and it worked.

Then kill the browser completely and create a copy of the “User Data” folder, leaving in the same location is fine. Didn’t think I was going to need it but after 10 or so volumes, it shutdown again. Same issue wouldn’t stay open, instant crash. The script itself seems innocuous, but it’s doing something unintentional (not breaking/trapping out of the crash?) and/or not getting along with something else. Since the computer is newer and not a lot of stuff setup the browser something to the profile unintentionally and it’s dead. Happened a total of 5x. The other 4x was happy to have the backup.

So before running this script in Brave (or maybe another browser) backup your User data folder for a fast recovery:

image

@ChristopherFritz if you wanted to tinker around with this or try to troubleshoot I haven’t deleted anything, ping me on discord.

Otherwise just sharing so hopefully will help out someone else if they have this issue. Thankfully finished all the image grabs (didn’t realize I had purchased duplicate digital copies of so much manga (51 vol - not counting the shosetu)

and so you know a chunk of these series/vols are you're fault

And yes I have the paperback of these as well - I’ll be sending you a bill c/o Takagi san :wink:


EDIT: Hey had anyone used setup and used Mokuro2PDF?
Seems like need to install more stuff and uhhhhh :roll_eyes:

This sounds completely crazy to be happening. Is there any chance of bad sectors on the hard drive?

Side comment, I had issues with a hard drive going bad and losing my browser profile. I had a complex Javascript I had written to use with BookWalker to hide various volumes/series from viewing, that I had updated off and on over maybe year or so, and as it turns out I never saved a copy of the script outside of the Tampermonkey extension. Gotta write it from scratch, and also lost my browser’s local storage that stored the list of series/volumes being hidden, all my WaniKani book club data, and all my reading progress tracking for the year (alongside some other things). It’s going to take a while getting everything back up and running!

We’ll just add it to the pile.

nothing obvious brand new machine and ssds, no idea :man_shrugging:

I have this running on the 2nd screen so could be that I just shouldn’t be using the computer while it does it’s thing. (did make sure the energy saver/background memory things were off in the settings before starting)

Suspect the reason the browser crashes is something else, memory issue, conflict or something but when it crashes it all goes to you know what :smiley: I’m pretty sure the script itself is fine, just doesn’t have any error trapping when reopening the browser (or if there is a way to make it stop running - don’t know where to change/rename)

ouch and I was kicking myself for not having all the setup done yet to get the system wide backups going

yeah uhhh too many books :books:

and then thanks to pixiv found some of my artists and manga that are older our out of print that wanted (dojin) couldn’t find anywhere else. But with pixiv/booth you can buy them directly from the mangaka get legit pdf versions. If you haven’t gone down that rabbit hole avoid - good way to open the wallet. Anyhow, have to image all those pdfs (there are definitely more than 51 :slight_smile: but they are shorter and don’t require a script to make the images thankfully).

Did you ever use/run the mokuro 2 pdf
started installing ruby and things this evening
fingers crossed it doesn’t all go sideways

That one I haven’t used.

so I’ll update on the Mokuro2PDF

some of the tests (magick asked for after installing) and whatnot to make sure magick “works”
the tests didn’t quite work but it did seem to be working (generating images) so I did do a test for proof of concept cuz I’m stubborn

It did work


but honestly the resolution is kind of eh compared with the screen saves
(probably could play with the settings and improve it) but it did work.
The downside - yomitan doesn’t work with PDFs

[unless you know a way to make it work - cuz then my files could be portable and I could use them on the tablet or phone and not be restricted to the notebook]

but the text is now copyable and one can cut and paste
vertical text and deepL (well still better than not being able to cut/paste at all I suppose)

guess I should start actually running to convert files and give up on the 2PDF part for now but at least seems like there is a workflow that should work now

No promises here, but have you tried using pdf.js to open the PDF and see if Yomitan works there?

so did tinker a bit and what I suspected is happening based on the cut/pasting from yesterday is the way the mokuro2pdf is working,

it won't work.

The pdf.js does work just not with a pdf created from the mokuro2pdf.

Easy to see if you do a cut/paste from the mokuro file (properly handles the vertical)
When you do the cute/paste from the 2pdf file it takes each char as it’s own line.
Not really useable. Supposed it could also be one of those pesky ID-10-T errors but I don’t think so at this point.

Also did notice the 2PDF screws up the page numbering.
Probably easily fixable by renaming the first 10 pages from
0, 1, 2, … to 00, 01, 02,…

For now probably a non starter. Will have to see if there’s a better way to copy these to my tablet or phone (still have to test that with yomitan anyway) but assuming it will work.

—EDIT—
Question: Wanted to confirm something.
When mokuro creates the _ocr files they are used to make the html file but then it seems they are no longer needed. Don’t want to throw them away, but…

If I copy the html and the image files to my tablet, seems like no need to maintain the ocr and/or relative directory to that folder. Correct?

Correct. The _ocr folder is essentially a cache, and is not needed for anything beyond building the HTML file. (Unless you’re me using the JSON files to create frequency lists.)

feels like a trap to get me to take over a bookclub or something :wink:

Actually, for that I use the HTML file only, so you may already be trapped.

Hello there. Anyone knows what happened to mokuro.moe? And will it come back?

Yes, it is hosted in Africa on a Raspberry Pi at someones parents house. They frequently experience blackouts and load shedding, the site goes down for long periods of time due to it (author has stated sometimes they have no electricity for multiple weeks).

The author does not live in Africa anymore and has said that usually they tell their parents “flick a switch and the Pi will come back online” but more major issues require them to visit home :frowning:

They have not messaged about any issues, so likely just load shedding.

That site falls squarely into the category of copyright infringement (putting entire volumes of manga online for anyone to read without buying them from the publisher or a licensed seller), so you may be limited on the information you can get on the site from the forums here.

The author does not care because it’s hosted in a small town in Africa, they have talked about how the police literally do not care at all about most crimes, yet alone digital crimes like copyright infringement :sweat_smile:


Does anyone know why after a few mins my Mokuro gets visibily blurry unless I restart? :frowning:

I don’t know, but what are all those colors? :o

pitch accent colouring from https://migaku.com/ !

Hey folks, I’ve started dabbling in using mokuro to help me do some sentencing mining from manga. Really impressive software overall.

Out of curiosity, does anyone know of a simple way to edit the textboxes that mokuro outputs? I know that you can edit the text within them, but I’m wondering if there’s any built in functionality to move/delete/resize the text boxes themselves. I did a little bit of slewing and it seems that the answer is no. And even if there was a way, it looks like the contents of the _ocr directory are baked directly into the html file and so if for some reason I needed to regenerate the html file all edits would be lost anyways. But you all seem quite knowledgeable so I figured I’d ask here.

Also for future readers of this thread: if anyone runs into issues installing mokuro where pyenv is not installing a python version with a functioning lzma module, I can help as I was able to fix this for myself.

Nope, not possible, you would at the very least need to have some tools made for this.

Nice :slight_smile: I would suggest that maybe you write about it now? If someone comes along in a few months, not sure if it’ll still be easy for you to remember the issues and fixes