Thanks again for sharing, it basically just worked out of the box!
I’m more comfortable using Java, so that’s what I did. And I have a Windows PC not Linux.
Also, I’ve only bought mangas on Bookwalker so far so it’s not the same method to get the jpeg (or png in my case).
I will describe my whole setup here and post my code in case it can help someone else.
Steps to make the Manga Text Search work - Windows / Java / Bookwalker
The exact steps I did:
- Installed Python 3.10 (not newer) and Mokuro
- Prepare my file structure:
C:
└── Users
....└── Mokuro
.......├── Ruri
.......|.....└── 01
.......└── results
.............├── script.js
.............└── style.css
The files script.js and style.css are those from @ChristopherFritz posted above.
- Getting the images files for the Manga to convert
I just opened Bookwalker on my phone, took a screenshot, turned the page, took a screenshot, turned the page, etc… A bit tedious, but I had everything after 5 minutes. Then I move all the images to the folder “01” in the tree structure above (as I did it with volume 01 of Ruri)
- Rename the files
The files are called Screenshot_2023_18_03_11_30_02.png and so on, so I run the first part of the code, that renames everything to 001.png, 002.png, etc.
So I update the variables in the main in the code below, and run the first part.
Mokuro.renameFiles(baseFolder + title + "\\" + volume);
- Mokuro
Now I let Mokuro do its job, I open a cmd terminal and just run
mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"
It takes around 40min.
- Extract searchable text
The same function that Christopher wrote in Ruby but I rewrote it in Java. I run in my main function:
Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
baseFolder + title + "\\_ocr\\" + volume);
- Search for any string
Now I can search for any string, I update the string and just run:
Mokuro.search(baseFolder, resultFile);
And then I can just open the result file, in C:\Users\Akashelia\Japanese\Mokuto\results\results.html
and voilà ! ![]()
- The whole code:
Only one external dependency:
<!-- https://mvnrepository.com/artifact/org.json/json -->
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20230227</version>
</dependency>
package com.gnd;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Mokuro {
// This needs to be manually changed to whatever I want to search for.
private static final String SEARCH = "間";
public static void main(String[] args) throws IOException {
String title = "DBZ";
String volume = "01";
String baseFolder = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\";
String resultFile = "C:\\Users\\Akashelia\\Japanese\\Mokuto\\results\\results.html";
// 1) Renames the files, starting at 001
// Mokuro.renameFiles(baseFolder + title + "\\" + volume);
// 2) update the path + title + volume and copy in cmd
// mokuro "C:\Users\Akashelia\Japanese\Mokuto\Ruri\01"
// 3) Extract the texts from the json files
//Mokuro.extractTextWithImageNames(baseFolder + title + "\\" + volume,
// baseFolder + title + "\\_ocr\\" + volume);
// Search for any string
Mokuro.search(baseFolder, resultFile);
}
public static void renameFiles(String folderPath) {
File folder = new File(folderPath);
if (!folder.isDirectory()) {
System.out.println(folderPath + " is not a directory.");
return;
}
File[] files = folder.listFiles();
int count = 1;
for (File file : files) {
if (file.isFile()) {
String extension = getFileExtension(file.getName());
String newFileName = String.format("%03d.%s", count++, extension);
File newFile = new File(folderPath, newFileName);
if (!file.renameTo(newFile)) {
System.out.println("Failed to rename " + file.getName());
}
}
}
}
private static String getFileExtension(String fileName) {
int lastIndex = fileName.lastIndexOf(".");
if (lastIndex == -1) {
return "";
}
return fileName.substring(lastIndex + 1);
}
public static void extractTextWithImageNames(String mangaPath, String path) {
System.out.println("Extract start. mangaPath=[" + mangaPath + "], json files path=[" + path + "]");
File folder = new File(path);
ArrayList<String> fileList = new ArrayList<String>();
for (File file : folder.listFiles()) {
if (file.isFile()) {
fileList.add(file.getAbsolutePath());
}
}
String[] files = new String[fileList.size()];
String[] jsonFiles = fileList.toArray(files);
String outputFilePath = mangaPath + File.separator + new File(mangaPath).getName() + " Searchable Text.txt";
File outputFile = new File(outputFilePath);
if (outputFile.exists()) {
System.out.println("file already exists " + outputFile.getName());
return;
}
try (PrintWriter writer = new PrintWriter(outputFile)) {
for (String file : jsonFiles) {
String imageFileName = new File(file).getName().replaceFirst("[.][^.]+$", "");
String jsonContent = new String(new URL("file:" + file).openStream().readAllBytes());
JSONObject parsed = new JSONObject(jsonContent);
System.out.println("File " + imageFileName + ", content " + parsed);
JSONArray blocks = parsed.getJSONArray("blocks");
for (int i = 0; i < blocks.length(); i++) {
JSONArray lines = blocks.getJSONObject(i).getJSONArray("lines");
StringBuilder builder = new StringBuilder();
for (int j = 0; j < lines.length(); j++) {
builder.append(lines.getString(j));
}
writer.println(imageFileName + "\t" + builder.toString());
}
}
} catch (IOException | JSONException e) {
e.printStackTrace();
}
}
public static void search(String baseFolder, String resultFile) throws IOException {
System.out.println("search started");
String output = "";
output += "<link rel=\"stylesheet\" href=\"styles.css\">";
output += "<script src=\"script.js\"></script>";
output += "<div id=\"parent\">";
output += "<div id=\"matches\">";
String currentSeries = "";
List<Path> filesToCheck = new ArrayList<>();
Files.walk(Paths.get(baseFolder))
.filter(Files::isRegularFile)
.filter(p -> p.toString().endsWith(".txt"))
.forEach(filesToCheck::add);
Collections.sort(filesToCheck);
System.out.println("Will check " + filesToCheck.size() + " files");
for (Path fileToCheck : filesToCheck) {
List<String> matches = new ArrayList<>();
Files.lines(fileToCheck).forEach(line -> {
if (line.matches(".*" + SEARCH + ".*")) {
matches.add(line);
}
});
if (matches.isEmpty()) {
continue;
}
String relativePath = fileToCheck.toString()
.replace(baseFolder, "")
.replace(" Searchable Text.txt", "");
String[] parts = relativePath.split("\\\\");
String series = parts[0];
String volume = parts[1];
String mangaFolder = baseFolder + series + "\\" + volume;
if (!Files.isDirectory(Paths.get(mangaFolder))) {
System.out.println("Cannot find manga folder: " + mangaFolder);
System.out.println("Implement checking other known locations.");
continue;
}
if (!series.equals(currentSeries)) {
output += "<h2>" + series + "</h2>";
currentSeries = series;
}
output += "<h3>" + volume + "</h3>";
output += "<ul>\n";
for (String match : matches) {
output += outputMatch(match, mangaFolder, SEARCH);
}
output += "</ul>\n";
}
output += "</div>";
output += "<div id=\"page\">";
output += "<a id=\"link\" target=\"_blank\"><img id=\"image\" /></a>";
output += "</div>";
output += "</div>";
Files.write(Paths.get(resultFile), output.getBytes());
System.out.println("search done");
}
private static String outputMatch(String match, String mangaFolder, String search) {
String[] parts = match.split("\t");
String imageFilename = parts[0];
String lineText = parts[1];
String imageFile = mangaFolder + "\\" + imageFilename + ".png";
if (!Files.isRegularFile(Paths.get(imageFile))) {
imageFile = mangaFolder + "\\" + imageFilename + ".png";
if (!Files.isRegularFile(Paths.get(imageFile))) {
System.out.println("Cannot find image file: " + imageFile);
System.out.println("Maybe its extension is not .jpg or .jpeg?");
return "";
}
}
String lineTextWithHtml = lineText.replaceAll("(" + search + ")", "<strong>$1</strong>").trim();
return "<li tabindex='0' onfocus='showImage(this, \"" + imageFile.replace("\\", "\\\\") + "\")'>" + imageFilename + ": " + lineTextWithHtml;
}
}