Mokuro: Read Japanese manga with selectable text inside a browser

Far as I know, websql is dead in the water, and indexeddb is the more modern standard. It’s pretty much just like localstorage on steroids.

2 Likes

Got it working :smiley: :smiley:
Thank you so much for sharing!

4 Likes

Do you have any plan of publishing the vocabulary lists you are doing on your website somewhere else? Like on Anki or else? If not I will probably do it myself for jpdb.io.

Anyway, thanks for your work, your vocabulary lists are really useful and your projects using Mokuro are really inspiring.

I wonder if the Calibre extension API could be used to integrate Mokuro directly in the Calibre reader for cbz files.

1 Like

I previously tried using them to create a deck for Kitsun, but there wasn’t enough functionality in working with decks for that to be viable.

From my site’s frequency lists page, every linked series should have a section at the top with a link to an ODS format spreadsheet with the frequency list. Anyone can do anything with these.

I think someone did a fork of Mokuro that specifically handles CBZ files, so maybe. I haven’t looked into it since none of the manga I buy comes in that format.

1 Like

TBH, cbz files are literally just a zip file of a folder of images renamed. You only need to re-rename them and unzip it.

1 Like

From my site’s frequency lists page, every linked series should have a section at the top with a link to an ODS format spreadsheet with the frequency list. Anyone can do anything with these.

Yeah, that’s probably what I will use to generate the decks from your vocabulary lists. Again, nice work :slightly_smiling_face:

I think someone did a fork of Mokuro that specifically handles CBZ files, so maybe. I haven’t looked into it since none of the manga I buy comes in that format.

This should be the easy part, Python handles zip files really well. The part I don’t know in what I said is if I can add features to the calibre reader with an extension but you probably don’t know too.

1 Like

is there a tutorial for cbz files? So it would open in web browser and not cdisplayex?

You mean with mokuro? Or just like a comic viewer in general?

mokuro,

because cdisplayex doesnt have this option to display pages in html so I can use yomichan to check he sentence dialogues

Due to my low familiarity with the format, my recommendation would be:

  1. Unzip the CBZ file. (May require changing the extension to “.zip”.)
  2. Run Mokuro on the unzipped folder that contains the images.
  3. Open the Mokuro-generated HTML in a web browser.
1 Like

This exactly. I’m not kidding when I say that cbz files are literally just renamed zips. It’s a pretty common technique in software development. Take a zip file, since most things can read and write zips easily, and then just rename it. Even I used it once.

Similarly, cbz is actually a family of file formats:

  • cbz → zip
  • cbr → rar
  • crt → tar
  • cb7 → 7z
  • cba → ace
4 Likes

I see. But I still miss what steps I have to take, I went to the github page and got lost, do I have to use

this GitHub - dmMaze/comic-text-detector: Manga&Comic text detection

and this GitHub - kha-white/manga-ocr: Optical character recognition for Japanese text, with the main focus being Japanese manga

with something else? Or is there one single script I have to run?

1 Like

You just need Mokuro, which in turn calls manga ocr for you.
After installing mokuro itself, you only need to do

mokuro name/of/folder

And there should be an html file in your current directory

1 Like

Question 1: Do you have Python installed? (I believe version 3.6 is recommended for Mokuro.)

If not, you can install it from here: Download Python | Python.org

If yes, when question 2: Do you have Pip installed? (If you have Python installed, you probably have Pip installed.)

Question 3: Do you have Mokuro installed? (If not, you can install it by putting the command pip3 install mokuro into a command terminal. If you’re unfamiliar with a command terminal, let us know your operating system, and we can help you get to it.)

Question 4: If you run the command mokuro in a terminal, what output does it give? (It may take several seconds to finish running and give output.)

Thanks again for sharing, it basically just worked out of the box!

I’m more comfortable using Java, so that’s what I did. And I have a Windows PC not Linux.
Also, I’ve only bought mangas on Bookwalker so far so it’s not the same method to get the jpeg (or png in my case).
I will describe my whole setup here and post my code in case it can help someone else.

Steps to make the Manga Text Search work - Windows / Java / Bookwalker

The exact steps I did:

  • Installed Python 3.10 (not newer) and Mokuro
  • Prepare my file structure:
C:
└── Users
....└── Mokuro
.......├── Ruri
.......|.....└── 01
.......└── results
.............├── script.js
.............└── style.css

The files script.js and style.css are those from @ChristopherFritz posted above.

  • Getting the images files for the Manga to convert

I just opened Bookwalker on my phone, took a screenshot, turned the page, took a screenshot, turned the page, etc… A bit tedious, but I had everything after 5 minutes. Then I move all the images to the folder “01” in the tree structure above (as I did it with volume 01 of Ruri)

  • Rename the files

The files are called Screenshot_2023_18_03_11_30_02.png and so on, so I run the first part of the code, that renames everything to 001.png, 002.png, etc.

So I update the variables in the main in the code below, and run the first part.

Mokuro.renameFiles(baseFolder + title + "\\" + volume);
  • Mokuro

Now I let Mokuro do its job, I open a cmd terminal and just run

mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

It takes around 40min.

  • Extract searchable text

The same function that Christopher wrote in Ruby but I rewrote it in Java. I run in my main function:

Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
                baseFolder + title + "\\_ocr\\" + volume);
  • Search for any string

Now I can search for any string, I update the string and just run:

Mokuro.search(baseFolder, resultFile);

And then I can just open the result file, in C:\Users\Akashelia\Japanese\Mokuto\results\results.html

and voilà ! :slight_smile:

  • The whole code:

Only one external dependency:

        <!-- https://mvnrepository.com/artifact/org.json/json -->
        <dependency>
            <groupId>org.json</groupId>
            <artifactId>json</artifactId>
            <version>20230227</version>
        </dependency>
package com.gnd;

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Mokuro {

    // This needs to be manually changed to whatever I want to search for.
    private static final String SEARCH = "間";

    public static void main(String[] args) throws IOException {

        String title = "DBZ";
        String volume = "01";

        String baseFolder = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\";
        String resultFile = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\results\\results.html";

        // 1) Renames the files, starting at 001
        // Mokuro.renameFiles(baseFolder + title + "\\" + volume);

        // 2) update the path + title + volume and copy in cmd
        // mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

        // 3) Extract the texts from the json files
        //Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
        //        baseFolder + title + "\\_ocr\\" + volume);

        // Search for any string
        Mokuro.search(baseFolder, resultFile);

    }

    public static void renameFiles(String folderPath) {
        File folder = new File(folderPath);
        if (!folder.isDirectory()) {
            System.out.println(folderPath + " is not a directory.");
            return;
        }
        File[] files = folder.listFiles();
        int count = 1;
        for (File file : files) {
            if (file.isFile()) {
                String extension = getFileExtension(file.getName());
                String newFileName = String.format("%03d.%s", count++, extension);
                File newFile = new File(folderPath, newFileName);
                if (!file.renameTo(newFile)) {
                    System.out.println("Failed to rename " + file.getName());
                }
            }
        }
    }

    private static String getFileExtension(String fileName) {
        int lastIndex = fileName.lastIndexOf(".");
        if (lastIndex == -1) {
            return "";
        }
        return fileName.substring(lastIndex + 1);
    }

    public static void extractTextWithImageNames(String mangaPath, String path) {
        System.out.println("Extract start. mangaPath=[" + mangaPath + "], json files path=[" + path + "]");
        File folder = new File(path);
        ArrayList<String> fileList = new ArrayList<String>();

        for (File file : folder.listFiles()) {
            if (file.isFile()) {
                fileList.add(file.getAbsolutePath());
            }
        }

        String[] files = new String[fileList.size()];
        String[] jsonFiles = fileList.toArray(files);
        String outputFilePath = mangaPath + File.separator + new File(mangaPath).getName() + " Searchable Text.txt";
        File outputFile = new File(outputFilePath);
        if (outputFile.exists()) {
            System.out.println("file already exists " + outputFile.getName());
            return;
        }

        try (PrintWriter writer = new PrintWriter(outputFile)) {
            for (String file : jsonFiles) {
                String imageFileName = new File(file).getName().replaceFirst("[.][^.]+$", "");
                String jsonContent = new String(new URL("file:" + file).openStream().readAllBytes());
                JSONObject parsed = new JSONObject(jsonContent);
                System.out.println("File " + imageFileName + ", content " + parsed);
                JSONArray blocks = parsed.getJSONArray("blocks");
                for (int i = 0; i < blocks.length(); i++) {
                    JSONArray lines = blocks.getJSONObject(i).getJSONArray("lines");
                    StringBuilder builder = new StringBuilder();
                    for (int j = 0; j < lines.length(); j++) {
                        builder.append(lines.getString(j));
                    }
                    writer.println(imageFileName + "\t" + builder.toString());
                }
            }
        } catch (IOException | JSONException e) {
            e.printStackTrace();
        }
    }


    public static void search(String baseFolder, String resultFile) throws IOException {
        System.out.println("search started");
        String output = "";

        output += "<link rel=\"stylesheet\" href=\"styles.css\">";
        output += "<script src=\"script.js\"></script>";

        output += "<div id=\"parent\">";
        output += "<div id=\"matches\">";

        String currentSeries = "";

        List<Path> filesToCheck = new ArrayList<>();
        Files.walk(Paths.get(baseFolder))
                .filter(Files::isRegularFile)
                .filter(p -> p.toString().endsWith(".txt"))
                .forEach(filesToCheck::add);
        Collections.sort(filesToCheck);

        System.out.println("Will check " + filesToCheck.size() + " files");

        for (Path fileToCheck : filesToCheck) {
            List<String> matches = new ArrayList<>();
            Files.lines(fileToCheck).forEach(line -> {
                if (line.matches(".*" + SEARCH + ".*")) {
                    matches.add(line);
                }
            });
            if (matches.isEmpty()) {
                continue;
            }

            String relativePath = fileToCheck.toString()
                    .replace(baseFolder, "")
                    .replace(" Searchable Text.txt", "");
            String[] parts = relativePath.split("\\\\");
            String series = parts[0];
            String volume = parts[1];

            String mangaFolder = baseFolder + series + "\\" + volume;
            if (!Files.isDirectory(Paths.get(mangaFolder))) {
                System.out.println("Cannot find manga folder: " + mangaFolder);
                System.out.println("Implement checking other known locations.");
                continue;
            }

            if (!series.equals(currentSeries)) {
                output += "<h2>" + series + "</h2>";
                currentSeries = series;
            }
            output += "<h3>" + volume + "</h3>";

            output += "<ul>\n";
            for (String match : matches) {
                output += outputMatch(match, mangaFolder, SEARCH);
            }
            output += "</ul>\n";
        }
        output += "</div>";

        output += "<div id=\"page\">";
        output += "<a id=\"link\" target=\"_blank\"><img id=\"image\" /></a>";
        output += "</div>";

        output += "</div>";

        Files.write(Paths.get(resultFile), output.getBytes());
        System.out.println("search done");
    }

    private static String outputMatch(String match, String mangaFolder, String search) {
        String[] parts = match.split("\t");
        String imageFilename = parts[0];
        String lineText = parts[1];
        String imageFile = mangaFolder + "\\" + imageFilename + ".png";
        if (!Files.isRegularFile(Paths.get(imageFile))) {
            imageFile = mangaFolder + "\\" + imageFilename + ".png";
            if (!Files.isRegularFile(Paths.get(imageFile))) {
                System.out.println("Cannot find image file: " + imageFile);
                System.out.println("Maybe its extension is not .jpg or .jpeg?");
                return "";
            }
        }
        String lineTextWithHtml = lineText.replaceAll("(" + search + ")", "<strong>$1</strong>").trim();
        return "<li tabindex='0' onfocus='showImage(this, \"" + imageFile.replace("\\", "\\\\") + "\")'>" + imageFilename + ": " + lineTextWithHtml;
    }
}
4 Likes

There’s this handy thing @ChristopherFritz made GitHub - ChristopherFritz/BookWalker-Screenshot-Simulator that might help with Bookwalker. Though it’s better to just buy through a site that gives out DRM copies and remove those.

3 Likes

Thanks, will have a look at that!
Also, don’t really want to buy from Amazon if I can help it.

2 Likes

Just tried it, wow that worked perfectly, way better and easier than my method :smiley: thanks to you both!

2 Likes

Been playing around with Mokuro today. Does anybody else find the navigation a bit odd? I would’ve expected the arrow keys to flip the pages instead of panning the page. I’ll guess I could make some changes in these sections:

document.addEventListener("keydown", function onEvent(e) {
    switch (e.key) {
        case "PageUp":
            prevPage();
            break;

        case "PageDown":
            nextPage();
            break;

        case "Home":
            firstPage();
            break;

        case "End":
            lastPage();
            break;

        case " ":
            nextPage();
            break;

        case "0":
            zoomDefault();
            break;
    }
});

and

function onKeyDown(e){var x=0,y=0,z=0;if(e.keyCode===38){y=1}else if(e.keyCode===40){y=-1}else if(e.keyCode===37){x=1}else if(e.keyCode===39){x=-1}else if(e.keyCode===189||e.keyCode===109){z=1}else if(e.keyCode===187||e.keyCode===107){z=-1}

That’d be an easy way to set page flipping to the arrow keys and panning to WASD (for example).

I find the zooming a bit clunky as well, and the canvas is huge! There is this part:

    pz = panzoom(pc, {
        bounds: true,
        boundsPadding: 0.05,
        maxZoom: 10,
        minZoom: 0.1,
        zoomDoubleClickSpeed: 1,
        enableTextSelection: true,

I’ve been playing around with these settings, but the way the bounds work is a bit confusing. The image doesn’t seem to be centered within the bounds, so whenever I try to set boundsPadding to a somewhat reasonable number (so that you can’t accidentally pan most of the image off of the screen), there is always some part of the image that gets cut off :thinking: I’ve tried looking in the panzoom documentation but I haven’t had much luck yet.

1 Like

Yes, it’s definitely tripping me a lot! But I feel like I’m getting used to it, and I like reading pretty zoomed in so I guess it’ll work fine

1 Like