New GraphQL API with dictionary, kanji dictionary, SVG images and morphological analyser

rlemaigre · February 20, 2022, 10:26am

Hi everyone !

I released a website recently to help people read Japanese text : https://jpaste.me/ . It didn’t attract much attention (I didn’t do much to popularize it either). So I decided to release all my data as a GraphQL API : https://rapidapi.com/rlemaigre/api/japanese-text-analysis/details. If you login you can try out the example queries.

GraphQL is awesome. You may learn about it here : https://graphql.org/.

With this API you can get word translations, kanji reading and meanings (including statistics about readings), kanji images (calligraphy and stroke order) and text morphological analysis. It can break down a Japanese text into chunks of characters (tokens) and give you the dictionary form (lemma) associated to each chunk.

It is free.

I hope you enjoy it and don’t hesitate to share !

mathewthe2 · February 20, 2022, 2:03pm

May I suggest using UNDIC instead of IPADIC?

rlemaigre · February 20, 2022, 2:38pm

Could you please elaborate ? I do not know Japanese myself…

The morphological analyser I use is Kuromoji (a javascript port of it actually) and I just used the default settings.

mathewthe2 · February 20, 2022, 3:09pm

Morphological Analyzers like Kuromji use pre-made dictionaries to get the readings of kanji.

IPADIC is the default one because it’s much smaller. However, this means it doesn’t contain a lot of common vocabulary. Even ひとり is not there.

In contrast, UNIDIC is much larger and contains more common vocabulary. You would probably need to follow the links here and download the dictionary file and change the maven settings in your kuromoji.

For the javascript port, iirc only the IPADIC version is supported. If you are using Node, you might want to try out the js port of Mecab.

rlemaigre · February 20, 2022, 3:38pm

There is no javascript ports of Mecab unfortunately. What you are referring to is just a wrapper…you need to have Mecab binaries installed for it to work. As far as I know, this makes it unsuitable for deployment on Google Cloud, which is what I use right now.

But Google Cloud Functions can be written in Java as well (I did it in JS), so I may use the original Kuromoji with unidic…might be faster than the JS port as well.

rlemaigre · March 10, 2022, 10:59pm

Ok it uses unidic now. I can’t judge by myself if it is any better because I don’t know enough Japanese.

The website actually uses the GraphQL API now. Anything you see in your browser is extracted from a single endpoint that is free for anyone and can be used to build any similar tool. Playground is here : https://jpaste.me/graphiql or on RapidAPI. Here are a few simple queries :

Ten most frequent kanji with their meanings, readings (with statistics) and the three most frequent words : query
Ten most frequent words with frequencies, translations, example sentences and a calligraphy image (svg in base64) for each character : query

Everything is located on a much better server than before (I quit Google Cloud and rented a small VPS in Germany) so it’s faster (at least from here in Europe).

I hope anyone can make good use of any of this.

Now I’d like to build a Chrome extension that does basically the same as the website but on any web page. Such extensions already exist, but maybe some people will prefer mine…we’ll see

mathewthe2 · March 12, 2022, 4:17am

The coverage is definitely better now! 一人 and 牛丼 are now working!

I prefer the look of your popup over existing ones like Yomichan, Tenten, and Rikaichamp. Rikaichamp is the ugliest looking of them all, and images are still wonky in Yomichan. I like that Japanese.io has example sentences, but Japanese IO requires you to login and use their platform. I prefer Anki and Yomichan instead.

Just a side note , f you’re deciding to make one - for the parsing you want to use a method called “forward scanning”. This is so you can have multiple entries when you scan a word - so when someone scans 牛丼, they can also see the definitions of 牛 and 丼.

Forward parsing requires a deinflector. You can check the deinflector implementation in Yomichan and Rikaichamp for reference. It involves extracting the rules and patterns in Japanese to extract from one phrase all possible entries in the dictionary. Here is an example of a deinflect file. Another example used in Rikaichan.

Naphthalene · March 12, 2022, 5:11am

Oof, not very phone friendly.

I just tried a random sentence (literally the next one in the book I was reading), and only a few things got turned to blue. Also, why does the furigana of 始まった include まっ (and why is the つ in katakana? )

rlemaigre · March 14, 2022, 9:25am

Thank you for your comment. I’ll definitely look into forward parsing

rlemaigre · March 14, 2022, 9:29am

I don’t intend to make this usable on phones for now.

Can you paste me the sentence above as text, rather than image, so I can try it out myself and look into what is happening ? Thank you

Raionus · March 14, 2022, 9:57am

まだ夜が明け切らぬうちに箱船のパーティーは始まった。

Naphthalene · March 14, 2022, 10:31am

If you just made it possible to scroll around, it would be fine

mathewthe2 · May 14, 2022, 8:22am

Been testing it again today. Could you overwrite mistakes for common words like 私(わたし) and 女(おんな) ?

Naphthalene · May 14, 2022, 8:54am

Speaking of which, the bugs I reported 2 months ago (failed to parse 明け, well 明け切る technically, katakana ツ as furigana for the hiragana つ) are still here
Ah, and the dictionary doesn’t know words like 箱船 and パーティー. I can look them up if I want to, but that kinda defeats the purpose of the tool

rlemaigre · May 16, 2022, 8:15am

Sorry I’ve been lazy lately. I’ll post in this thread whenever the bugs are fixed (if they can actually be fixed).

system · May 16, 2023, 8:15am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Android] Kanji Graph - a smart dictionary and review app API And Third-Party Apps	10	516	August 20, 2021
Any Japanese OCR lib/API recommendations? API And Third-Party Apps	7	3216	March 1, 2023
Learn Japanese with gengo.tech! API And Third-Party Apps	24	1067	March 23, 2025
Please Try out our App - Heisig Type API And Third-Party Apps	40	2916	July 18, 2018
Japanese supporting apps on Ipad? Resources	18	1276	March 9, 2023

New GraphQL API with dictionary, kanji dictionary, SVG images and morphological analyser

Related topics