Any Japanese OCR lib/API recommendations?

Hello everyone!

I’m building an app (primarily to help my own studies) with a dictionary lookup.

Since I like reading paper books it’d be conveniet to take photo of a page an select the kanji to look up.

Currently Google Translate app has this feature but I don’t like how it works and I’d like to avoid using multiple apps. Which is why I want to build my own.

Do you have any recommendations for libraries or api to extract Japanese text from images?

So far I found 3 paid APIs:

The ABBYY one doesn’t even list prices, which isn’t a good sign…

4 Likes

Hey there, I’ve done some OCR work before with Japanese to build a web app which reads, sorts, splits and merges PDFs based on their content.

For that work, I did some investigation into available OCR APIs, and specifically I tried Google Cloud Vision, ABBYY Cloud OCR and Tesseract. I believe I tried some others, but I don’t remember what they were because I never got around to seriously considering them.

Google Cloud Vision performed the best overall for my purposes, but only just barely. Tesseract is an open source OCR project that was developed and maintained by Google from 2006 to 2018. It was hard to get working, but in the end was only slightly less accurate overall than the paid Google Cloud Vision service. Tesseract is also not a service, which could be either a disadvantage or an advantage depending on your project requirements, meaning you’ll need to either add it as a dependency in your code or host it somewhere yourself. For us the fact that it was not a service was a positive.

ABBYY Cloud, at least in our tests and for our purposes, was significantly less effective than Google Cloud Vision or Tesseract.

The main downside to Tesseract is that you’ve got to configure it (and potentially train it) yourself. Also Google Cloud Vision has built in features to process your photo and make it easier to read for their OCR software, with Tesseract any image processing that needs to be done needs to be done by you.

I’d recommend at least trying Tesseract. If it works for your purposes, then you don’t need to tie yourself down to a third party service.

I think the main Tesseract engine is written in C++, but there are ports for various languages you can find if you search. There’s even one written in pure JavaScript.

For our project, we went with Tesseract because, again at least for our purposes, Google Cloud Vision was something like 98% accurate, and Tesseract after configuring it was something like 96% accurate, and while Google Cloud Vision worked a bit better, it was more convenient for us to not rely on a third party service and we liked that it was free.

5 Likes

Thanks for sharing your experience!

Since it’s just a personal project for me I don’t think I want to invest the time into using Tesseract but I see the appeal of not relying on a 3rd party service.

Maybe @jprspereira can share some knowledge here? He ran some OCR experiments but I have no idea how it went… :sweat_smile:

2 Likes

Not sure if this is helpful as it’s not exactly an API, but I use Capture2Text as my main OCR while reading manga. It does have a command line version too, so potentially you could invoke it that way from an external application and then process the resulting text string. It’s also Open Source, so you could also embed part of the code into your own software.

A potential showstopper is that it’s Windows only.

http://capture2text.sourceforge.net/

I’ve been using it for many years, and at least for decently standard fonts it does a very good job, and even gets close enough for some more fancier fonts.

Interesting, looks like it’s using Tesseract under the hood. Thanks for sharing.

1 Like