AI based Japanese grammar checker (文法ーCHECK)

Hey everyone!

I’m a ~N1 level Japanese speaker, living in Tokyo and working as an AI researcher. I couldn’t find any decent free Japanese grammar checker online, so I decided to make one using AI!

Here it is:

How to use it

Just type your text in, hit check, and like magic, underlines will appear under any problematic parts of your sentence. Tap an underlined section to see the suggestion for fixing it. Click the speech bubble, and the AI will give you an explanation for the suggestion.

How accurate is it?

Will it always get everything right? Nope - this is still experimental and a work in progress.


Update: March 10th 2024 :partying_face:

文法ーCHECK has been updated today with a brand new, modern UI. In addition to giving suggestions for fixing your sentence, you can now also get explanations, as well as have general conversations about Japanese. I’ve also added a tone check feature for converting text to casual / normal / keigo politeness, too.

Note that this is still a work in progress and might give incorrect results as always. :slight_smile: Any and all feedback is welcome. I’m going to continue to work to make the accuracy better and better over time.


Thanks for everyone’s support until now! :heart: hope everyone enjoys the new features.

41 Likes

I have a feeling this thing is going to hate colloquial speech. It really doesn’t like when I skip を after 何. :laughing:

6 Likes

That’s a good point actually - it was mostly trained on Wikipedia and Japanese websites which tend to use more formal language, so it doesn’t recognise colloquial speech as well… I could mine a whole bunch of Japanese tweets and add those to the training data as examples of casual language, but then would those tweets just have terrible grammar? Dilemma :thinking:

5 Likes

Maybe you could have two models?

3 Likes

Yeah, probably have to choose how to address that eventually. Something like ら抜き言葉 (leaving the ら out of something like 見られる and just saying 見れる when the meaning is potential form) is totally normal, but it will not be accepted on a conjugation test as correct.

6 Likes

2 models could be an idea, and then to let users select whether they want to check more formal, or more casual grammar. That could also give better performance on each type of text. An ideal model would figure out the formality from the rest of the text and judge it appropriately, but I might be limited on data and computational power to do that… hmm. Good food for thought.

2 Likes

I am going to guess there are likely more issues than this that will need to be confronted. As the Japanese Wikipedia article on Grammar Checkers seems to suggest it is an English thing and there are no notable Japanese grammar checkers.

Anyway, I fed it some text. And I see the dotted red line, but I also have a single solid red line under one character and I’m not sure what that is.

Also does 飛ばす here mean “skip”? I’m not an expert in computer terminology, but I feel like there’s a better word to be used there.

retrain the model on tweets! or even better aozora and shosetu no narou

1 Like

Yeah, grammar checking is still very much an unsolved problem in Japanese. Though note that what my site is doing is just detecting errors, which is a lot easier than also suggesting the correct grammar like you might get in MS Word if you right click. There are some research groups like Nara Institute for Science and Technology that have been working on this for a while, their research papers helped me figure out this model actually, but the field still a lot of scope for improvement.

Anyway - solid red line means an error. Dotted line means the line wasn’t checked for some reason - it could be too short, or too long etc. If you mouse over or tap the dotted underlined text, it will tell you the problem.

I think 飛ばす is the best word for the job! A couple of my Japanese native friends thought it was too. But open to suggestions if anyone knows the nuances on this!

2 Likes

btw it doesn’t recognize ため息をついた like in 彼女はまた深いため息をついた。

sorry to be that guy

I feel like I saw this in another context recently and was similarly confused, so this is twice now. But I don’t remember what the other one was. :thinking:

I would suggest picking more standard colors for things. Basically red for bad, green for good, some other color for neutral. Right now there’s purple, pink, blue, and green, so it’s not super easy to follow.

2 Likes

i was surprised when i couldn’t find anything like grammar to check my japanese homework. anyways kudos it’s a step in a cool direction!

1 Like

Oh I didn’t know about 小説家になろう ! What a great resource! Thanks for suggesting it. And with regards to ため息 - it actually does work with 溜息, so it could be a formality issue again or just the data it’s been exposed to.

1 Like

I am trying to think of what my computer uses for Ignore / Skip, But I can’t for the life of my think of something I can do in two seconds to make a confirmation box pop up with that specific option in it. I have a strong feeling it is just something like スキップ or some other Katakana though.

That makes more sense about the dotted line. I didn’t realize it wasn’t checking it because it was too long, I was under the impression that that is what it thought was an “error.” But now knowing you’re not suggesting things like that it makes more sense. Also now I read the whole sentence it gives me. What’s interesting then is where it seems to cut off then, cause it’s only highlighting about half of the sentence, then saying it’s too long. I do agree with the above that maybe just changing the color would help. I’m already conditioned for red squiggly line or something that looks similar means mistake.

But now I understand most of what it is giving me. It seems to not like 伸ばし which is unsurprising. But in this sentence 令和三年四月一日から新テキストの内容を実施する。 it doesn’t like を実施する and I’m kinda at a loss as to why it thinks so.

i took a random paragraph from haruki murakami’s 1q84. maybe murakami is weird!

Hmm, well if a message pops up with the Japanese for Skip on there, then please do let me know and I’ll update the site accordingly.

Yeah, it’s a bit confusing where it cuts it off; essentially Japanese AI models work by splitting sentences into little chunks called ‘tokens’ which can be of varying length; my model has a maximum capacity of 48 tokens. It’s possible to make models with much more capacity, but training them requires beefy computers and graphics cards, I’m just doing this in my bedroom. I also really wanted squiggly lines but they are unfortunately unreliable on Japanese text :cry: maybe a few Chrome/Safari updates from now they’ll improve.

For that sentence, I can’t see anything that’s wrong either. I’ll just say it’s still early days for the model, and hopefully with more training, errors like that will become less and less frequent. Thanks for trying it out and for your feedback. :slightly_smiling_face:

Haha well Murakami is probably at least slightly weird! But I think it’s a mistake of the model in that case.

1 Like

Just to say, I’ve been using this with my writing practice and it’s helping me a lot. Until now it’s been difficult to identify and self-correct errors, so this really helps my self-review. Thanks for creating and sharing.

2 Likes

Thanks so much for the positive feedback! That makes me really happy to hear. :slight_smile:

1 Like