OK. A bit of an update with the project.
Finally, I’m doing all the steps by myself and it hasn’t turned to be such a terrible thing. For once scanners are now much faster than how I remember them. Using the flatbed scan is ok, at least to scan the book at a pace of 2 chapters every now and then (that’s about 30-35 pages), which amounts to 10-15 mins (at 600dpi that is).
With the scanned PDF, I go through Adobe’s OCR, which it is also much faster than what I recalled OCR was like.
OCR is not perfect though; it’s very accurate but for some reason it will recognize characters I thought would surely fail given the amount of strokes and how mushy they looked on the book (like 鬱 for example) and then consistently fail others that seemed easier like the 「興」from 興奮 for example. It also failed to recognize「壜」 which was the choice for “bottle” the author made (instead of 瓶). Which given the protagonist is an alcoholic, appears quite often .
For the most part I can’t really complain about character recognition (except maybe for 「く」 and the “<” sign getting mixed a couple of times) honestly, it has in no case become the bottleneck I thought it could be in the process.
Also, Adobe has a feature that can replace all the images of characters with fonts, so will make a clean document clearly separating characters and images (different from text). Sadly it seems that characters with different orientation (title of the book as header in horizontal with the rest in vertical japanese style format), japanese comas and specially furigana will play tricks with Acrobat’s feature rendering it useless for japanese.
仕方がないな。。。
The next step has been using this handy website
Here I can copy/paste the OCR lines (while avoiding to paste furigana and checking for characters not recognized correctly) into a notepad version and make it into txt format. Which is what then goes here
Where you can make that an epub file with the correct right to left vertical orientation style of ebook.
Epubs can be turn easily (just open and export as mobi) into Kindle format with the Kindle Previewer app.
And that’s it.
So, probably the best case scenario for digitalizing physical books into Kindle format is to live in Japan and use the service Koichi mentioned (though is the expensive version the one that will provide ebooks as outcome).
If not in Japan or prone to “DIY projects” type of person, I must say having now the key steps clear enough, I see myself repeating the process with some of the books in my wishlist that don’t have an ebook version currently.