Testing GPT-4.0 accuracy rate on Japanese language

I have been playing with GPT-4 lately, it seems to be quite more accurate than the free version (3.5) so I thought about trying to quantify in some manner its actual hallucination rate when it comes to Japanese. I’ve been using it for grammar, so I decided I’ll post some interactions.
I invite who’s expert with the language to come often here, check the AI responses and share an opinion. I’d like to measure its accuracy over ~100 interactions.
The thread diverted a bit over all-things-AI so for simplicity the latest interaction will be marked as the thread solution

• Thread hijacked to programming and CS in general

6 Likes

I have found 4 much better than 3.5 in other areas, so I’m very interested to see what you find here. Though, I am seeing it give completely false information on technical topics deep in the weeds. Like layer 3 routing to overlapping subnets in Linux, or PowerShell AST manipulation.

3 Likes

Interesting, can you share some question-answer?

To me, in regards of the subjects I tested it on, it was surprisingly accurate and didn’t provide not even one fake source.

I noticed though that at times it gets kind of stuck in a loophole and keeps providing wrong arguments, in such cases I suggest trying to work with approaches like “chain of thoughts”, step-by-step, recursive augmentation etc. Another thing that seems to work nicely is self-reevaluation.

2 Likes

These sorts of examples are pretty old at this point, and I don’t have those chats anymore. I’m mostly focused on playing with the Browse with Bing beta feature at the moment. I can start a chat without beta features and see if I can reproduce some of those results when I get a chance

2 Likes

Ah, never mind then, btw I tried bing as well (without beta features) for quite some times with Japanese grammar and had the impression that it’s junk. At times it was more similar to Siri than to GPT :joy:

2 Likes

No, I meant the ChatGPT beta feature. So chats can include data from external sites

3 Likes

4 Likes

I feel like 4’s improved accuracy over 3.5 is more than offset by its reliance on Bing. :stuck_out_tongue:

8 Likes

I gave Vanilla’s N5 quiz to chatGPT-4 and it got 2 out of 10.

Not impressed :face_with_monocle::face_with_monocle::face_with_monocle:

11 Likes

Oh sorry, I misunderstood :grin: I tried that function and to me it looks more accurate than other plugins that are supposed to do the same

1 Like

It doesn’t seem to work well with this kind of request…

1 Like

Have you tried using a prompt engineering or other method to increase accuracy? I’d be curious to see if anything changes after inputting something like the prompt I mentioned in the post description.

1 Like

My impression so far with GPT-4 is that it perform a bit better if we avoid mixing language in the prompt.

Intuitively I would guess that it had a vast amount of training data in English only and in Japanese only, but probably comparatively little training data with mixed language.

But I will try some later. I’m also interested to know if rerunning the same prompt change a lot the answers or not. (the quiz is actually hard, so I wouldn’t be surprised if it just answers somewhat randomly)

3 Likes

Nice observation, I agree, but during my use I had the strong impression that after learning how to properly drive it (and talk with it), its accuracy substantially increased… I’ll try to share something, also lets see what happens over a larger amount of tests

1 Like

I haven’t gotten around to it yet, but I really want to try the roleplaying with ChatGPT.
Here’s why:
I don’t live in Japan, so once in a while if I come across someone talking Japanese, I really want to go and talk to them. And I don’t do it for two reasons: first, what would I think myself if a stranger came up to me while I was minding my own business and be like “Hey! You speak English! I too I learning everyday very like your country lol!”… And second of all, I feel like I could maybe say one sentence well, like “毎晩、寝る前に、日本語を勉強します”, and then they would answer something and I wouldn’t understand/ wouldn’t know what to say next…
So I’m going to use ChatGPT to be that “stranger” for me, looking forward to seeing what it will reply to my opening sentence, and then I can prepare some following lines, and who knows, maybe next time I’ll meet a Japanese person I’ll feel confident enough to try and have a dialog with them :slight_smile:

6 Likes

hmm I don’t think at the moment it’s usable for a roleplay to practice language because that is not an approach that provides accuracy in my opinion. atm.

In my experience I’d say it’s most useful after a good prompt, some easy questions on short sentences and a bit of redirecting.

I would appreciate it and enjoy the conversation :slight_smile: I suggest you doing it.
I heard a japanese lady on the tram talking Japanese at the phone and recognized some word, so after she put down the call I told her “すみません…日本人ですか?” haha and we started talking. She was very friendly and exchanged numbers, it turns out she’s an italian teacher in a school for japanese-italian kids!
You see? It’s not a big deal, if someone is shy or in a hurry you’ll know and ignore it, but I strongly suggest you engaging with people in the street if it happens to be a reason and don’t worry too much

That’s the point, you’re not going to improve at it if you don’t practice daily. And if you don’t know any Japanese native the chances are either paying a conversation teacher on those app or engage with people.

5 Likes

Just tried it myself, on the second try after checking if everything was correct it got 6 answers right with the wolfram plugin enabled
https://chat.openai.com/share/ccde72c1-b918-48d9-a27b-fc1ac6aad7bb

after some tries I noticed that it never gets the first and last one correct, probably because てみろ is a rarely used form to express conditional?

About the last one, once it provided me this:

As I mentioned before, for question J, none of the provided readings are typically associated with the kanji “誰何”. The options provided (kiyūru, ringo, ichigo, suika) are all names of fruits or vegetables in Japanese, not typical readings for this kanji. The answer provided is based on the assumption that there might be a typographical error in the question.

1 Like

Don’t be confused about what GPT (in whatever version) is. It’s just a large, generative language model. That can yield some impressive results in certain situations, but it’s not capable of independent reasoning.

People who seriously use it for programming or similar tasks confuse me. It’s more than capable of generating total nonsense, including function calls that don’t even exist. It’s probably good enough for generating some boiler plate, but that’s not really the hard part.

7 Likes

I agree, the developers have been very clear on their suggestion to not use it for applications were there something at stake.
For such an use I’d wait to have a product that has 1/10 the margin error of a human

1 Like

I get a 404 error on that link. I’m a bit confused what wolfram plugin could bring to language questions like this.

Godamnit, got called out by chatGPT :sweat_smile: I made a typo when transcribing the quiz! Of course it’s きゅうり not きゅうる :smile:

But note that the part none of the provided readings are typically associated with the kanji 誰何 is wrong! 誰何 can be read すいか, that’s the joke.

2 Likes