Mokuro: Read Japanese manga with selectable text inside a browser

Due to my low familiarity with the format, my recommendation would be:

  1. Unzip the CBZ file. (May require changing the extension to “.zip”.)
  2. Run Mokuro on the unzipped folder that contains the images.
  3. Open the Mokuro-generated HTML in a web browser.
1 Like

This exactly. I’m not kidding when I say that cbz files are literally just renamed zips. It’s a pretty common technique in software development. Take a zip file, since most things can read and write zips easily, and then just rename it. Even I used it once.

Similarly, cbz is actually a family of file formats:

  • cbz → zip
  • cbr → rar
  • crt → tar
  • cb7 → 7z
  • cba → ace
4 Likes

I see. But I still miss what steps I have to take, I went to the github page and got lost, do I have to use

this GitHub - dmMaze/comic-text-detector: Manga&Comic text detection

and this GitHub - kha-white/manga-ocr: Optical character recognition for Japanese text, with the main focus being Japanese manga

with something else? Or is there one single script I have to run?

1 Like

You just need Mokuro, which in turn calls manga ocr for you.
After installing mokuro itself, you only need to do

mokuro name/of/folder

And there should be an html file in your current directory

1 Like

Question 1: Do you have Python installed? (I believe version 3.6 is recommended for Mokuro.)

If not, you can install it from here: Download Python | Python.org

If yes, when question 2: Do you have Pip installed? (If you have Python installed, you probably have Pip installed.)

Question 3: Do you have Mokuro installed? (If not, you can install it by putting the command pip3 install mokuro into a command terminal. If you’re unfamiliar with a command terminal, let us know your operating system, and we can help you get to it.)

Question 4: If you run the command mokuro in a terminal, what output does it give? (It may take several seconds to finish running and give output.)

Thanks again for sharing, it basically just worked out of the box!

I’m more comfortable using Java, so that’s what I did. And I have a Windows PC not Linux.
Also, I’ve only bought mangas on Bookwalker so far so it’s not the same method to get the jpeg (or png in my case).
I will describe my whole setup here and post my code in case it can help someone else.

Steps to make the Manga Text Search work - Windows / Java / Bookwalker

The exact steps I did:

  • Installed Python 3.10 (not newer) and Mokuro
  • Prepare my file structure:
C:
└── Users
....└── Mokuro
.......├── Ruri
.......|.....└── 01
.......└── results
.............├── script.js
.............└── style.css

The files script.js and style.css are those from @ChristopherFritz posted above.

  • Getting the images files for the Manga to convert

I just opened Bookwalker on my phone, took a screenshot, turned the page, took a screenshot, turned the page, etc… A bit tedious, but I had everything after 5 minutes. Then I move all the images to the folder “01” in the tree structure above (as I did it with volume 01 of Ruri)

  • Rename the files

The files are called Screenshot_2023_18_03_11_30_02.png and so on, so I run the first part of the code, that renames everything to 001.png, 002.png, etc.

So I update the variables in the main in the code below, and run the first part.

Mokuro.renameFiles(baseFolder + title + "\\" + volume);
  • Mokuro

Now I let Mokuro do its job, I open a cmd terminal and just run

mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

It takes around 40min.

  • Extract searchable text

The same function that Christopher wrote in Ruby but I rewrote it in Java. I run in my main function:

Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
                baseFolder + title + "\\_ocr\\" + volume);
  • Search for any string

Now I can search for any string, I update the string and just run:

Mokuro.search(baseFolder, resultFile);

And then I can just open the result file, in C:\Users\Akashelia\Japanese\Mokuto\results\results.html

and voilà ! :slight_smile:

  • The whole code:

Only one external dependency:

        <!-- https://mvnrepository.com/artifact/org.json/json -->
        <dependency>
            <groupId>org.json</groupId>
            <artifactId>json</artifactId>
            <version>20230227</version>
        </dependency>
package com.gnd;

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Mokuro {

    // This needs to be manually changed to whatever I want to search for.
    private static final String SEARCH = "間";

    public static void main(String[] args) throws IOException {

        String title = "DBZ";
        String volume = "01";

        String baseFolder = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\";
        String resultFile = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\results\\results.html";

        // 1) Renames the files, starting at 001
        // Mokuro.renameFiles(baseFolder + title + "\\" + volume);

        // 2) update the path + title + volume and copy in cmd
        // mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

        // 3) Extract the texts from the json files
        //Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
        //        baseFolder + title + "\\_ocr\\" + volume);

        // Search for any string
        Mokuro.search(baseFolder, resultFile);

    }

    public static void renameFiles(String folderPath) {
        File folder = new File(folderPath);
        if (!folder.isDirectory()) {
            System.out.println(folderPath + " is not a directory.");
            return;
        }
        File[] files = folder.listFiles();
        int count = 1;
        for (File file : files) {
            if (file.isFile()) {
                String extension = getFileExtension(file.getName());
                String newFileName = String.format("%03d.%s", count++, extension);
                File newFile = new File(folderPath, newFileName);
                if (!file.renameTo(newFile)) {
                    System.out.println("Failed to rename " + file.getName());
                }
            }
        }
    }

    private static String getFileExtension(String fileName) {
        int lastIndex = fileName.lastIndexOf(".");
        if (lastIndex == -1) {
            return "";
        }
        return fileName.substring(lastIndex + 1);
    }

    public static void extractTextWithImageNames(String mangaPath, String path) {
        System.out.println("Extract start. mangaPath=[" + mangaPath + "], json files path=[" + path + "]");
        File folder = new File(path);
        ArrayList<String> fileList = new ArrayList<String>();

        for (File file : folder.listFiles()) {
            if (file.isFile()) {
                fileList.add(file.getAbsolutePath());
            }
        }

        String[] files = new String[fileList.size()];
        String[] jsonFiles = fileList.toArray(files);
        String outputFilePath = mangaPath + File.separator + new File(mangaPath).getName() + " Searchable Text.txt";
        File outputFile = new File(outputFilePath);
        if (outputFile.exists()) {
            System.out.println("file already exists " + outputFile.getName());
            return;
        }

        try (PrintWriter writer = new PrintWriter(outputFile)) {
            for (String file : jsonFiles) {
                String imageFileName = new File(file).getName().replaceFirst("[.][^.]+$", "");
                String jsonContent = new String(new URL("file:" + file).openStream().readAllBytes());
                JSONObject parsed = new JSONObject(jsonContent);
                System.out.println("File " + imageFileName + ", content " + parsed);
                JSONArray blocks = parsed.getJSONArray("blocks");
                for (int i = 0; i < blocks.length(); i++) {
                    JSONArray lines = blocks.getJSONObject(i).getJSONArray("lines");
                    StringBuilder builder = new StringBuilder();
                    for (int j = 0; j < lines.length(); j++) {
                        builder.append(lines.getString(j));
                    }
                    writer.println(imageFileName + "\t" + builder.toString());
                }
            }
        } catch (IOException | JSONException e) {
            e.printStackTrace();
        }
    }


    public static void search(String baseFolder, String resultFile) throws IOException {
        System.out.println("search started");
        String output = "";

        output += "<link rel=\"stylesheet\" href=\"styles.css\">";
        output += "<script src=\"script.js\"></script>";

        output += "<div id=\"parent\">";
        output += "<div id=\"matches\">";

        String currentSeries = "";

        List<Path> filesToCheck = new ArrayList<>();
        Files.walk(Paths.get(baseFolder))
                .filter(Files::isRegularFile)
                .filter(p -> p.toString().endsWith(".txt"))
                .forEach(filesToCheck::add);
        Collections.sort(filesToCheck);

        System.out.println("Will check " + filesToCheck.size() + " files");

        for (Path fileToCheck : filesToCheck) {
            List<String> matches = new ArrayList<>();
            Files.lines(fileToCheck).forEach(line -> {
                if (line.matches(".*" + SEARCH + ".*")) {
                    matches.add(line);
                }
            });
            if (matches.isEmpty()) {
                continue;
            }

            String relativePath = fileToCheck.toString()
                    .replace(baseFolder, "")
                    .replace(" Searchable Text.txt", "");
            String[] parts = relativePath.split("\\\\");
            String series = parts[0];
            String volume = parts[1];

            String mangaFolder = baseFolder + series + "\\" + volume;
            if (!Files.isDirectory(Paths.get(mangaFolder))) {
                System.out.println("Cannot find manga folder: " + mangaFolder);
                System.out.println("Implement checking other known locations.");
                continue;
            }

            if (!series.equals(currentSeries)) {
                output += "<h2>" + series + "</h2>";
                currentSeries = series;
            }
            output += "<h3>" + volume + "</h3>";

            output += "<ul>\n";
            for (String match : matches) {
                output += outputMatch(match, mangaFolder, SEARCH);
            }
            output += "</ul>\n";
        }
        output += "</div>";

        output += "<div id=\"page\">";
        output += "<a id=\"link\" target=\"_blank\"><img id=\"image\" /></a>";
        output += "</div>";

        output += "</div>";

        Files.write(Paths.get(resultFile), output.getBytes());
        System.out.println("search done");
    }

    private static String outputMatch(String match, String mangaFolder, String search) {
        String[] parts = match.split("\t");
        String imageFilename = parts[0];
        String lineText = parts[1];
        String imageFile = mangaFolder + "\\" + imageFilename + ".png";
        if (!Files.isRegularFile(Paths.get(imageFile))) {
            imageFile = mangaFolder + "\\" + imageFilename + ".png";
            if (!Files.isRegularFile(Paths.get(imageFile))) {
                System.out.println("Cannot find image file: " + imageFile);
                System.out.println("Maybe its extension is not .jpg or .jpeg?");
                return "";
            }
        }
        String lineTextWithHtml = lineText.replaceAll("(" + search + ")", "<strong>$1</strong>").trim();
        return "<li tabindex='0' onfocus='showImage(this, \"" + imageFile.replace("\\", "\\\\") + "\")'>" + imageFilename + ": " + lineTextWithHtml;
    }
}
3 Likes

There’s this handy thing @ChristopherFritz made GitHub - ChristopherFritz/BookWalker-Screenshot-Simulator that might help with Bookwalker. Though it’s better to just buy through a site that gives out DRM copies and remove those.

3 Likes

Thanks, will have a look at that!
Also, don’t really want to buy from Amazon if I can help it.

2 Likes

Just tried it, wow that worked perfectly, way better and easier than my method :smiley: thanks to you both!

2 Likes

Been playing around with Mokuro today. Does anybody else find the navigation a bit odd? I would’ve expected the arrow keys to flip the pages instead of panning the page. I’ll guess I could make some changes in these sections:

document.addEventListener("keydown", function onEvent(e) {
    switch (e.key) {
        case "PageUp":
            prevPage();
            break;

        case "PageDown":
            nextPage();
            break;

        case "Home":
            firstPage();
            break;

        case "End":
            lastPage();
            break;

        case " ":
            nextPage();
            break;

        case "0":
            zoomDefault();
            break;
    }
});

and

function onKeyDown(e){var x=0,y=0,z=0;if(e.keyCode===38){y=1}else if(e.keyCode===40){y=-1}else if(e.keyCode===37){x=1}else if(e.keyCode===39){x=-1}else if(e.keyCode===189||e.keyCode===109){z=1}else if(e.keyCode===187||e.keyCode===107){z=-1}

That’d be an easy way to set page flipping to the arrow keys and panning to WASD (for example).

I find the zooming a bit clunky as well, and the canvas is huge! There is this part:

    pz = panzoom(pc, {
        bounds: true,
        boundsPadding: 0.05,
        maxZoom: 10,
        minZoom: 0.1,
        zoomDoubleClickSpeed: 1,
        enableTextSelection: true,

I’ve been playing around with these settings, but the way the bounds work is a bit confusing. The image doesn’t seem to be centered within the bounds, so whenever I try to set boundsPadding to a somewhat reasonable number (so that you can’t accidentally pan most of the image off of the screen), there is always some part of the image that gets cut off :thinking: I’ve tried looking in the panzoom documentation but I haven’t had much luck yet.

1 Like

Yes, it’s definitely tripping me a lot! But I feel like I’m getting used to it, and I like reading pretty zoomed in so I guess it’ll work fine

1 Like

I’ve seen others comment on this as well.

I read zoomed in and use the mouse to zoom in/out, pan, and do look-ups, so I don’t use the keyboard for navigation myself.

Although I don’t recall when it happens, I often end up with the image 90% off-screen. Not in my normal reading, though, so I can’t recall what condition it occurs in.

1 Like

Thanks for sharing the bookshelf code, it’s really neat! I love the ‘progress colour effect’ :heart_eyes:

1 Like

I’ve been playing around with adding novels to your bookshelf :grin:

I buy them from Amazon, then load them up in Calibre and extract the text.

The green line is a bookmark. It helps me keep my place (as you’d expect lol) and it tracks the current ‘page’ for the progress bar.

The code is probably pretty gnarly cause there was a lot of trial and error involved, but I look forward to polishing it as I learn about coding more properly :grin:

5 Likes

I tried GitHub - ChristopherFritz/BookWalker-Screenshot-Simulator on a book I’ve purchased but it didn’t work (no error, just prints Downloading complete! but didn’t do anything).
Is it possible that it only works on the free previews?

It works for me! Did you make sure that “Page Moving Direction” was set to “Vertical” before running it?

2 Likes

Aah!! Indeed it wasn’t on Vertical, I hadn’t checked and just assumed it still was from using it that way last time. Thanks a lot :smiley:

2 Likes

How far did you get? I got it running with Docker. It doesn’t like it when I feed an entire novel into it though :stuck_out_tongue_closed_eyes:

2 Likes

It’s been a while since my last couple of attempts, so I don’t recall.

I’ve never used Docker, so maybe that’s a route for me to try out one day.

For what it’s worth, I used the instructions here.

1 Like