Mokuro: Read Japanese manga with selectable text inside a browser

ChristopherFritz · March 14, 2023, 4:16am

Manga Text Search

Update: I recommend using the Python implementation rather than this Ruby version.

This one is not currently available, but I can look into making it useable if there is interest. It’s currently tied to my directory structure and spans multiple programming languages.

The first part is a Ruby script that reads the JSON files from a volume folder, that were created by Mokuro.

This is the prominent function.

def extract_text_with_image_names(manga_path, json_files)

    output_file = File.join(manga_path, "#{File.basename(manga_path)} Searchable Text.txt")
    return if File.file?(output_file)

    File.open(output_file, "w+") do |f|
        json_files.each do |file|
            image_file_name = File.basename(file, File.extname(file))
            parsed = JSON.parse(open(file).read, symbolize_names: true)
            parsed[:blocks].each do |block|
                f.puts("#{image_file_name}\t#{block[:lines].join}")
            end
        end
    end

end

This generates a single text file containing every line of dialogue extracted by Mokuro and the file name of the page the text is from.

The file is saved into the same folder as the JSON files, but I also have a separate process that copies it to a central location where all these text files are stored.

With the manga folders in a specified location, with series folders and volume subfolders, and with all the text files copied in a single folder, I use a Ruby file to search through them and generate an HTML file with my results:

search.rb

require 'naturally'

# This needs to be manually changed to whatever I want to search for.
search = /何.{0,3}今の/

def output_match(match, manga_folder, search)
  image_filename, line_text = match.split("\t")
  image_file = "#{manga_folder}/#{image_filename}.jpg"
  image_file = "#{manga_folder}/#{image_filename}.jpeg" unless File.file?(image_file)
  unless File.file?(image_file)
    puts "Cannot find image file: #{image_file}"
    puts 'Maybe its extension is not .jpg or .jpeg?'
    return ''
  end
  line_text_with_html = line_text.gsub(/(#{search})/, '<strong>\1</strong>').chomp
  "<li tabindex='0' onfocus='showImage(this, \"#{image_file}\")'>#{image_filename}: #{line_text_with_html}</li>\n"
end

base_folder = '/home/chris/Documents/OCR/OCR Process/Outputs/OCR'

output = ''

output += '<link rel="stylesheet" href="styles.css">'
output += '<script src="script.js"></script>'

output += '<div id="parent">'
output += '<div id="matches">'

current_series = ''

files_to_check = Dir.glob("#{base_folder}/**/*.txt")
sorted_files_to_check = Naturally.sort(files_to_check)

sorted_files_to_check.each do |file_to_check|
  matches = []
  IO.foreach(file_to_check) do |line|
    matches.append(line) if line =~ search
  end
  next if matches.empty?

  # puts file_to_check
  series, volume = file_to_check.sub("#{base_folder}/", '').sub(' Searchable Text.txt', '').split('/')

  manga_folder = "/home/chris/Books/Comics/Japanese/#{series}/#{volume}"
  unless File.directory?(manga_folder)
    puts "Cannot find manga folder: #{manga_folder}"
    puts 'Implement checking other known locations.'
    next
  end

  if series != current_series
    output += "<h2>#{series}</h2>"
    current_series = series
  end
  output += "<h3>#{volume}</h3>"

  output += "<ul>\n"
  matches.each do |match|
    output += output_match(match, manga_folder, search)
  end
  output += "</ul>\n"
end
output += '</div>'

output += '<div id="page">'
output += '<a id="link" target="_blank"><img id="image" /></a>'
output += '</div>'

output += '</div>'

File.write('results.html', output)

A couple of files accompany the saved file:

script.js

var lastCopiedText = '';

function showImage(element, imagePath) {
	console.log(`ShowImage: ${imagePath}`)
	document.getElementById("link").href = imagePath;
	document.getElementById("image").src = imagePath;
	if (lastCopiedText != imagePath) {
		navigator.clipboard.writeText(imagePath);
		lastCopiedText = imagePath;
	}
}

style.css

a {text-decoration: none; color: black;}

h2 {
position: sticky;
background-color: white;
top: 0;
}

#parent {
display: flex;
justify-content: center;
height: 99%;
}

#matches {
width: 600px;
height: 99%;
overflow: auto;
}

#page {
width: 600px;
height: 99%;
}

ul {
list-style: none;
padding-left: 0;
}
li {
border: solid thin beige;
padding: 1px 2px;
cursor: pointer;
}
li:hover { background: lightblue; }
li:focus { background: pink; }

#image {width: inherit; position: sticky; max-height: 100%;}

strong {color: #ff003c; font-weight: normal;}

With those, I can change the search term in the Ruby file and re-run it. Then, open the generated HTML file in a web browser.

This gives a page where I can easily view manga pages (from my collection) that contain what I’m looking for:

(For anyone wondering how I can pull examples of any random vocabulary or grammar from several manga at a moment’s notice in book clubs…this is it.)

Topic		Replies	Views
Recommendations for reading digital manga with OCR Reading	4	1020	November 20, 2024
How do you all go about reading ebooks? Reading	42	7322	December 7, 2021
Manga Kotoba: Manga Frequency Lists and Stats Resources	191	5178	August 30, 2025
How to create a vocabulary deck from manga? Resources	13	219	July 22, 2025
Kaku - Japanese OCR Dictionary API And Third-Party Apps	38	12347	August 25, 2022

Mokuro: Read Japanese manga with selectable text inside a browser

Manga Text Search

Related topics