AI based Japanese grammar checker

Hey everyone! I’m a ~N2 level Japanese learner and currently doing my Masters in Comp Sci & AI. I couldn’t find any decent free Japanese grammar checker online, so I decided to make an AI model which can highlight grammatical errors in Japanese! It’s totally free to use, there are no ads or anything, so if you wanna check it out just click the link below -

https://bunpo-check.com/

How to use it

Just type your text in, hit check, and like magic, underlines will appear under any problematic parts of your sentence. Be aware there’s a limit of 100 sentences, after which you’ll have to wait a while before checking again (to prevent the servers getting overloaded).

How accurate is it?

Will it always get everything right? Nope - do note, this is still experimental and a work in progress. Don’t use this to check your Japanese resume, but if you’re wondering whether a sentence sounds weird or might be off, this can help you figure out which part is wrong.

Update: 19 March 2021

:fire: Improved accuracy

Based on everyone’s helpful feedback, I’ve re-trained the model on a much larger dataset, including data from Twitter and Japanese web novels. You may have noticed a fairly big accuracy boost lately, especially on colloquial text. Thanks to everyone who gave advice and keep it coming!

:hugs: Open sourcing models and code

If you’re interested in this sort of thing or know how to code, you might be interested in checking out the source code I used to create the data. You can also download the full model for use in whatever you like, at my github page:

Update: 8 April 2021

:relaxed: No more model feedback needed!

Thanks to everyone who gave feedback to the model through the mechanism on the site! Thousands of sentences were sent as feedback which was really helpful to me in seeing where the model was going wrong. Since the performance is now pretty good, I’ve removed the feedback mechanism which means, no more annoying messages asking you for feedback! Should make checking things much quicker. As always though, feel free to send me your thoughts or problems with the site, either here or at the email address on the website.

:lipstick: Make-over!

I decided to re-design the site with a brighter theme, since some people couldn’t read the text clearly. I’ve also set the default language to English.


Thanks for all your interest, almost 10,000 sentences have now been checked on 文法ーCHECK! :open_mouth:

As always let me know if you have suggestions or issues! :heart:

29 Likes

I have a feeling this thing is going to hate colloquial speech. It really doesn’t like when I skip を after 何. :laughing:

6 Likes

That’s a good point actually - it was mostly trained on Wikipedia and Japanese websites which tend to use more formal language, so it doesn’t recognise colloquial speech as well… I could mine a whole bunch of Japanese tweets and add those to the training data as examples of casual language, but then would those tweets just have terrible grammar? Dilemma :thinking:

3 Likes

Maybe you could have two models?

2 Likes

Yeah, probably have to choose how to address that eventually. Something like ら抜き言葉 (leaving the ら out of something like 見られる and just saying 見れる when the meaning is potential form) is totally normal, but it will not be accepted on a conjugation test as correct.

5 Likes

2 models could be an idea, and then to let users select whether they want to check more formal, or more casual grammar. That could also give better performance on each type of text. An ideal model would figure out the formality from the rest of the text and judge it appropriately, but I might be limited on data and computational power to do that… hmm. Good food for thought.

2 Likes

I am going to guess there are likely more issues than this that will need to be confronted. As the Japanese Wikipedia article on Grammar Checkers seems to suggest it is an English thing and there are no notable Japanese grammar checkers.

Anyway, I fed it some text. And I see the dotted red line, but I also have a single solid red line under one character and I’m not sure what that is.

Also does 飛ばす here mean “skip”? I’m not an expert in computer terminology, but I feel like there’s a better word to be used there.

retrain the model on tweets! or even better aozora and shosetu no narou

1 Like

Yeah, grammar checking is still very much an unsolved problem in Japanese. Though note that what my site is doing is just detecting errors, which is a lot easier than also suggesting the correct grammar like you might get in MS Word if you right click. There are some research groups like Nara Institute for Science and Technology that have been working on this for a while, their research papers helped me figure out this model actually, but the field still a lot of scope for improvement.

Anyway - solid red line means an error. Dotted line means the line wasn’t checked for some reason - it could be too short, or too long etc. If you mouse over or tap the dotted underlined text, it will tell you the problem.

I think 飛ばす is the best word for the job! A couple of my Japanese native friends thought it was too. But open to suggestions if anyone knows the nuances on this!

2 Likes

btw it doesn’t recognize ため息をついた like in 彼女はまた深いため息をついた。

sorry to be that guy

I feel like I saw this in another context recently and was similarly confused, so this is twice now. But I don’t remember what the other one was. :thinking:

I would suggest picking more standard colors for things. Basically red for bad, green for good, some other color for neutral. Right now there’s purple, pink, blue, and green, so it’s not super easy to follow.

2 Likes

i was surprised when i couldn’t find anything like grammar to check my japanese homework. anyways kudos it’s a step in a cool direction!

1 Like

Oh I didn’t know about 小説家になろう ! What a great resource! Thanks for suggesting it. And with regards to ため息 - it actually does work with 溜息, so it could be a formality issue again or just the data it’s been exposed to.

1 Like

I am trying to think of what my computer uses for Ignore / Skip, But I can’t for the life of my think of something I can do in two seconds to make a confirmation box pop up with that specific option in it. I have a strong feeling it is just something like スキップ or some other Katakana though.

That makes more sense about the dotted line. I didn’t realize it wasn’t checking it because it was too long, I was under the impression that that is what it thought was an “error.” But now knowing you’re not suggesting things like that it makes more sense. Also now I read the whole sentence it gives me. What’s interesting then is where it seems to cut off then, cause it’s only highlighting about half of the sentence, then saying it’s too long. I do agree with the above that maybe just changing the color would help. I’m already conditioned for red squiggly line or something that looks similar means mistake.

But now I understand most of what it is giving me. It seems to not like 伸ばし which is unsurprising. But in this sentence 令和三年四月一日から新テキストの内容を実施する。 it doesn’t like を実施する and I’m kinda at a loss as to why it thinks so.

i took a random paragraph from haruki murakami’s 1q84. maybe murakami is weird!

Hmm, well if a message pops up with the Japanese for Skip on there, then please do let me know and I’ll update the site accordingly.

Yeah, it’s a bit confusing where it cuts it off; essentially Japanese AI models work by splitting sentences into little chunks called ‘tokens’ which can be of varying length; my model has a maximum capacity of 48 tokens. It’s possible to make models with much more capacity, but training them requires beefy computers and graphics cards, I’m just doing this in my bedroom. I also really wanted squiggly lines but they are unfortunately unreliable on Japanese text :cry: maybe a few Chrome/Safari updates from now they’ll improve.

For that sentence, I can’t see anything that’s wrong either. I’ll just say it’s still early days for the model, and hopefully with more training, errors like that will become less and less frequent. Thanks for trying it out and for your feedback. :slightly_smiling_face:

Haha well Murakami is probably at least slightly weird! But I think it’s a mistake of the model in that case.

1 Like

Just to say, I’ve been using this with my writing practice and it’s helping me a lot. Until now it’s been difficult to identify and self-correct errors, so this really helps my self-review. Thanks for creating and sharing.

2 Likes

Thanks so much for the positive feedback! That makes me really happy to hear. :slight_smile:

1 Like