Mokuro: Read Japanese manga with selectable text inside a browser

Akashelia · March 18, 2023, 10:40am

Thanks again for sharing, it basically just worked out of the box!

I’m more comfortable using Java, so that’s what I did. And I have a Windows PC not Linux.
Also, I’ve only bought mangas on Bookwalker so far so it’s not the same method to get the jpeg (or png in my case).
I will describe my whole setup here and post my code in case it can help someone else.

Steps to make the Manga Text Search work - Windows / Java / Bookwalker

The exact steps I did:

Installed Python 3.10 (not newer) and Mokuro
Prepare my file structure:

C:
└── Users
....└── Mokuro
.......├── Ruri
.......|.....└── 01
.......└── results
.............├── script.js
.............└── style.css

The files script.js and style.css are those from @ChristopherFritz posted above.

Getting the images files for the Manga to convert

I just opened Bookwalker on my phone, took a screenshot, turned the page, took a screenshot, turned the page, etc… A bit tedious, but I had everything after 5 minutes. Then I move all the images to the folder “01” in the tree structure above (as I did it with volume 01 of Ruri)

Rename the files

The files are called Screenshot_2023_18_03_11_30_02.png and so on, so I run the first part of the code, that renames everything to 001.png, 002.png, etc.

So I update the variables in the main in the code below, and run the first part.

Mokuro.renameFiles(baseFolder + title + "\\" + volume);

Mokuro

Now I let Mokuro do its job, I open a cmd terminal and just run

mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

It takes around 40min.

Extract searchable text

The same function that Christopher wrote in Ruby but I rewrote it in Java. I run in my main function:

Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
                baseFolder + title + "\\_ocr\\" + volume);

Search for any string

Now I can search for any string, I update the string and just run:

Mokuro.search(baseFolder, resultFile);

And then I can just open the result file, in C:\Users\Akashelia\Japanese\Mokuto\results\results.html

and voilà !

The whole code:

Only one external dependency:

        <!-- https://mvnrepository.com/artifact/org.json/json -->
        <dependency>
            <groupId>org.json</groupId>
            <artifactId>json</artifactId>
            <version>20230227</version>
        </dependency>

package com.gnd;

import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Mokuro {

    // This needs to be manually changed to whatever I want to search for.
    private static final String SEARCH = "間";

    public static void main(String[] args) throws IOException {

        String title = "DBZ";
        String volume = "01";

        String baseFolder = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\";
        String resultFile = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\results\\results.html";

        // 1) Renames the files, starting at 001
        // Mokuro.renameFiles(baseFolder + title + "\\" + volume);

        // 2) update the path + title + volume and copy in cmd
        // mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"

        // 3) Extract the texts from the json files
        //Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
        //        baseFolder + title + "\\_ocr\\" + volume);

        // Search for any string
        Mokuro.search(baseFolder, resultFile);

    }

    public static void renameFiles(String folderPath) {
        File folder = new File(folderPath);
        if (!folder.isDirectory()) {
            System.out.println(folderPath + " is not a directory.");
            return;
        }
        File[] files = folder.listFiles();
        int count = 1;
        for (File file : files) {
            if (file.isFile()) {
                String extension = getFileExtension(file.getName());
                String newFileName = String.format("%03d.%s", count++, extension);
                File newFile = new File(folderPath, newFileName);
                if (!file.renameTo(newFile)) {
                    System.out.println("Failed to rename " + file.getName());
                }
            }
        }
    }

    private static String getFileExtension(String fileName) {
        int lastIndex = fileName.lastIndexOf(".");
        if (lastIndex == -1) {
            return "";
        }
        return fileName.substring(lastIndex + 1);
    }

    public static void extractTextWithImageNames(String mangaPath, String path) {
        System.out.println("Extract start. mangaPath=[" + mangaPath + "], json files path=[" + path + "]");
        File folder = new File(path);
        ArrayList<String> fileList = new ArrayList<String>();

        for (File file : folder.listFiles()) {
            if (file.isFile()) {
                fileList.add(file.getAbsolutePath());
            }
        }

        String[] files = new String[fileList.size()];
        String[] jsonFiles = fileList.toArray(files);
        String outputFilePath = mangaPath + File.separator + new File(mangaPath).getName() + " Searchable Text.txt";
        File outputFile = new File(outputFilePath);
        if (outputFile.exists()) {
            System.out.println("file already exists " + outputFile.getName());
            return;
        }

        try (PrintWriter writer = new PrintWriter(outputFile)) {
            for (String file : jsonFiles) {
                String imageFileName = new File(file).getName().replaceFirst("[.][^.]+$", "");
                String jsonContent = new String(new URL("file:" + file).openStream().readAllBytes());
                JSONObject parsed = new JSONObject(jsonContent);
                System.out.println("File " + imageFileName + ", content " + parsed);
                JSONArray blocks = parsed.getJSONArray("blocks");
                for (int i = 0; i < blocks.length(); i++) {
                    JSONArray lines = blocks.getJSONObject(i).getJSONArray("lines");
                    StringBuilder builder = new StringBuilder();
                    for (int j = 0; j < lines.length(); j++) {
                        builder.append(lines.getString(j));
                    }
                    writer.println(imageFileName + "\t" + builder.toString());
                }
            }
        } catch (IOException | JSONException e) {
            e.printStackTrace();
        }
    }


    public static void search(String baseFolder, String resultFile) throws IOException {
        System.out.println("search started");
        String output = "";

        output += "<link rel=\"stylesheet\" href=\"styles.css\">";
        output += "<script src=\"script.js\"></script>";

        output += "<div id=\"parent\">";
        output += "<div id=\"matches\">";

        String currentSeries = "";

        List<Path> filesToCheck = new ArrayList<>();
        Files.walk(Paths.get(baseFolder))
                .filter(Files::isRegularFile)
                .filter(p -> p.toString().endsWith(".txt"))
                .forEach(filesToCheck::add);
        Collections.sort(filesToCheck);

        System.out.println("Will check " + filesToCheck.size() + " files");

        for (Path fileToCheck : filesToCheck) {
            List<String> matches = new ArrayList<>();
            Files.lines(fileToCheck).forEach(line -> {
                if (line.matches(".*" + SEARCH + ".*")) {
                    matches.add(line);
                }
            });
            if (matches.isEmpty()) {
                continue;
            }

            String relativePath = fileToCheck.toString()
                    .replace(baseFolder, "")
                    .replace(" Searchable Text.txt", "");
            String[] parts = relativePath.split("\\\\");
            String series = parts[0];
            String volume = parts[1];

            String mangaFolder = baseFolder + series + "\\" + volume;
            if (!Files.isDirectory(Paths.get(mangaFolder))) {
                System.out.println("Cannot find manga folder: " + mangaFolder);
                System.out.println("Implement checking other known locations.");
                continue;
            }

            if (!series.equals(currentSeries)) {
                output += "<h2>" + series + "</h2>";
                currentSeries = series;
            }
            output += "<h3>" + volume + "</h3>";

            output += "<ul>\n";
            for (String match : matches) {
                output += outputMatch(match, mangaFolder, SEARCH);
            }
            output += "</ul>\n";
        }
        output += "</div>";

        output += "<div id=\"page\">";
        output += "<a id=\"link\" target=\"_blank\"><img id=\"image\" /></a>";
        output += "</div>";

        output += "</div>";

        Files.write(Paths.get(resultFile), output.getBytes());
        System.out.println("search done");
    }

    private static String outputMatch(String match, String mangaFolder, String search) {
        String[] parts = match.split("\t");
        String imageFilename = parts[0];
        String lineText = parts[1];
        String imageFile = mangaFolder + "\\" + imageFilename + ".png";
        if (!Files.isRegularFile(Paths.get(imageFile))) {
            imageFile = mangaFolder + "\\" + imageFilename + ".png";
            if (!Files.isRegularFile(Paths.get(imageFile))) {
                System.out.println("Cannot find image file: " + imageFile);
                System.out.println("Maybe its extension is not .jpg or .jpeg?");
                return "";
            }
        }
        String lineTextWithHtml = lineText.replaceAll("(" + search + ")", "<strong>$1</strong>").trim();
        return "<li tabindex='0' onfocus='showImage(this, \"" + imageFile.replace("\\", "\\\\") + "\")'>" + imageFilename + ": " + lineTextWithHtml;
    }
}

Topic		Replies	Views
Recommendations for reading digital manga with OCR Reading	4	1553	November 20, 2024
How do you all go about reading ebooks? Reading	42	7501	December 7, 2021
Manga Kotoba: Manga Frequency Lists and Stats Resources	289	6751	January 18, 2026
How to create a vocabulary deck from manga? Resources	13	303	July 22, 2025
Kaku - Japanese OCR Dictionary API And Third-Party Apps	38	12480	August 25, 2022

Mokuro: Read Japanese manga with selectable text inside a browser

Related topics