Physical books to e-book

Hi everyone.
After using e-books for the most part of this year to read in Japanese I’m quite hooked at how much those have ease my experience reading.
Besides the Kindle, which is my to go device, I’ve started scanning some of my Shodo books which I realized was avoiding because of the huge increment in difficulty those added to my reading habits (basically those are much more technical and will contain some dated terms that tends to find its way in calligraphy).
I pass those scanned PDFs through OCR and basically read the text using Yomichan in the explorer. This has resulted in a much more pleasurable experience now. The Shodo books I use will have something like 8-10 full text pages in the introduction, then it’s mostly models to copy from, so I limit the process to the introductory chapter only.

So with that in mind, now looking for my next novel to read, I got really interest in きらきらひかる by 江國 香織. Sadly no e-book was available. I bought the book anyways, but now while waiting for it to arrive I was thinking how to make it into an e-book using a similar process as with my Shodo books.
I really think any time I spend doing so will be rewarded when reading with all the additions e-books have. So, does anyone have experience doing this process?
Any tips you could share?

8 Likes

Won’t you have to scan every single page, possibly removing the binding to facilitate?

1 Like

I second @seanblue. I have a bit of experience with scanning in Japanese media (manga/comic pages, no OCR), and unless there’s a faster way to do it, you end up pretty much just scanning in every single page.

I’ve heard Adobe has a pretty good [free] scanning app with Japanese OCR, but your alternatives to scanning with a phone are ripping the book apart (for cleaner page scans) or using those wand-type scanners that you slide over each page.

As a bonus, I did this for the first couple chapters of a required reading novel because some people hadn’t bought the book yet and needed the pages. My neck wasn’t happy with me.

2 Likes

Small novels will fit two pages at one, spread open that is, in my scanner, which is like an office model.
I saw a video with one guy digitalizing his library, and indeed the setup he had was with a scanner that would process both sides of the page at the same time, but will require single sheets, so he would take apart the book first (I’m not super against it frankly, but I don’t have that scanner :man_shrugging: ).

I’m using Acrobat Pro for the OCR, and works like a charm with Japanese, for magazines does the trick pretty well too even with lots of graphics and different fonts.

Since most of the other books in my reading list are available as e-books, probably I will simply scan the book myself or look for a service. The novel will keep me busy for some time, so I don’t really mind.

Anyone else has made this process? What about the pdf to e-book step?

I wonder how it compares to google translate’s camera feature. Whenever I try to use that on less used (maybe non-jouyou?) kanji with lots of strokes it can’t handle it. It always assumes it’s a simpler kanji.

Not sure about the comparison, but indeed it’s not perfect either, though it improves with more resolution and font size. That was the case when scanning twice at 300p and then 600p it would fail first and then would recognize correctly the 『旁』from 偏旁. So probably is more related to that and not necessarily because it’s jouyou or not.

Acrobat performs better though with different fonts, it actually picks up some of the calligraphy samples in handwritten fonts. And did a better job in picking up some text with different background in a magazine.

That’s all the comparison I can make. :man_shrugging:

I did this before with 獣の奏者 before I realized there were e-books available.

I personally found the process tedious, but still preferred it to looking up dozens of words by hand. It greatly increased my reading flow.

Most OCR software on the market, however, is not very good. I haven’t made good experiences with free software at all, so I used the ABBYY 30-days free trial.

However, I recently found out that Google’s Lens app has pretty good OCR for Japanese. It also skips the scanning process. So if your phone has a decent camera, using Lens for OCR in a well-lit room may be the easiest way.

Keep in mind, though, that OCR generally has problems with furigana. Advanced OCR software may have options to handle furigana, but I haven’t found an app with a feature like that.

1 Like

I love physical books, so the idea of bending back the binding to a book that’s still intact is a crime in my head.

My flatbed scanner didn’t process both sides, so if I took a book apart, I’d place two adjacent pages side by side and try to get both at once (if they fit). I had a wand scanner to skip all that tedium (and to avoid buying that same book again if I really liked it), but the pages can sometimes come out looking warped.

1 Like

I didn’t think of that. Will wait for the book to arrive to check that out.
Shodo books will have no furigana to speak of, so I never bumped with that issue before. :sweat_smile:

I’ve used some Google OCR software, I think it was Google translate. It worked fine, but mostly reading in the subway/trains now, but avoiding using my phone’s camera while reading is pretty much why I picked a Kindle in the first place :smile:

After moving twice this year from one country to another I’m in a very Marie Kondo state of mind these days regarding accumulating anything that can be spent… so ultimately I will thank the book for the joy of the reading and dispose it in some fashion. :speak_no_evil:

3 Likes

This does not inspire joy :cry:

9 Likes

I mean, if you just want to read in your browser, you can just copy and paste the OCR’d text from Lens and paste them into a simple HTML document. I’m not sure if that’s more efficient than scan + OCR in the long run, but to be honest I don’t think there’s a really comfortable way unless you set up a bunch of automations (which is probably also not an easy task).

1 Like

Well, I’ll just update the project with the book in hand for now, I guess I was just curious about not hearing someone mentioning this before, considering how having more technical books made into selectable text in my computer has had the advantage of, for starters allowed me to read such text :smile: , I thought someone might already have made a similar workflow for novels too.
It’s not something I expect to do often, but I’m guessing I might bump with the same situation again in the future, so I’ll just mention how the workflow goes if someone wants to repeat it and better yet provide some tips.

Then again I might realize that I haven’t heard about it because others have already come to the conclusion that’s a pain in the ass doing this. :rofl:

Probably after I’m done I’ll just look for a local Japanese institute and donate the book there, no tree pulp will go to waste :innocent: .

1 Like

This is a process that unless I was really short on money, I would pay a pro to do. Maybe it’s because I worked in a book bindery as a teenager. After seeing a machine that can trim off a binding and leave a clean edge on the stack that can then go in a high-volume scanner, trying to do that by hand would just make me angry. :stuck_out_tongue:

Before you do it yourself, call a copy shop and ask what it would cost to scan and OCR X number of pages. It might be surprisingly cheap and your time might be worth the money.

3 Likes

There are book scanning services dedicated to this as well. I’m not sure how popular these are around the world, but there are several choices in the US at least. I’ve used 1dollarscan.com several times for Japanese textbooks (so I could easily take them with me on my iPad when I travel), and the results have been pretty good. They do destroy the physical copy in the process — I believe they cut the binding off and then recycle everything afterwards — which is kind of a bummer, but it’s been a good trade off for me.

3 Likes

1dollarscan.com is a pretty good service for those without access to the correct office equipment. You need a guillotine cutter meant to cut large stacks of paper (rare/expensive compared to your standard single sheet guillotine cutter) and you need a sheet feeding scanner (like the Fujitsu SnapScan which runs about $400 or so).

I’m willing to bet that places like Staples have the equipment but have policies against this kind of scanning.

Unless it’s a particularly rare or old book, I don’t feel bad cutting the binding off and scanning it for personal use to cut down on bulk. Presumably there are plenty of copies circulating in libraries around the world.

It does run counter to the theory of “re-use, re-purpose, up-cycle” though…

3 Likes

Thanks. I was just looking for scanning services locally, and it’s way more expensive than that here, but actually shipping overseas from Amazon.jp, depending on how tedious this project might turn to be, I see myself shipping other books directly to these guys and have the end result only, which frankly is what I care.

2 Likes

I would do that, but I would just feel so sad destroying my manga collection. :sweat_smile:

2 Likes

If you have a library in your area look if they have scanners (not office scanners but book scanners) they will make the whole process way faster (I scanned around 200 pages in 10 Minutes once) and some of the new ones even have an OCR program included so you don’t need to run our afterwards.

Another :+1: for 1dollarscan. I send a lot of books there and they’re great. They’re a Japanese company, so they handle “backwards” Japanese books with no problem.

Speaking of Japanesestrong text books…

If you’re buying books off Amazon.co.jp, though, I use https://www.bookscan.co.jp/. They’re the same people as 1dollarscan, actually. Big wait if you don’t have a monthly subscription, unfortunately. But, if you wishlist a bunch of books and do it all at once, it’s not as bad. Some publishers have blocked them from scanning their books, though, so be sure to use their checker tool to see if they are allowed to scan it or not. A lot of manga series, for example…

Both do OCR, though you might need to pay more for it? I can’t remember if it’s included in the default or not… but, certainly very useful!

13 Likes

Do they do any sort of post-OCR editing or do you just get the raw output? For the rate they charge, I assume the latter rather than the former?