Testing GPT-4.0 accuracy rate on Japanese language

Kazzeon · August 21, 2023, 8:23pm

Yeah, the problem I saw was that the system needs more instructions in order to be more accurate, and going by, “is X right?” doesn’t accomplish that.

But then you have to almost be an expert in the subject (or at least in asking questions) in order to get it to make a good explanation.

Belthazar · August 21, 2023, 10:45pm

Oh, I scored it like I was marking a test. First part: one correct observation, bad example, no rendaku, one mark. Second part: basically correct, three marks. Third part: not relevant, zero marks. Total score: four.

mariodesu · August 22, 2023, 3:52pm

EDIT: answered to the wrong person…

I don’t understand what this means, could you explain further? I’m not sure I ever approached a 1-10 vote with this logic (not that I got what your logic exactly was here)

Belthazar · August 22, 2023, 9:30pm

Basically, for everthing ChatGPT got right, it earned a point.

mariodesu · August 22, 2023, 10:14pm

OT

Interesting article from a Turing Award.
Summarized conclusion: analysis suggests that no current AI systems are conscious, but also shows that there are no obvious barriers to building conscious AI systems.

yamitenshi · August 23, 2023, 8:23am

Not an entirely surprising conclusion, but it’s always good to see what more intelligent people with more knowledge of the subject think

Vanilla · August 26, 2023, 5:49am

@mariodesu thought this may interest you

mariodesu · August 27, 2023, 4:15am

Thanks for tagging, that confirmed my thoughts and it’s basically how I use it (in referemce to the part about gpt)
It also gave me the idea of trying to follow some japanese account on 𝕏 , I think it could be a nice learning source especially when I have no time to sit and read something serious

mariodesu · September 18, 2023, 10:09am

Maybe this is more appropriate to post in my computer science thread but, since the subject was widely discussed here as well, and since this is properly about checking GPT responses validity, here we go:

The answer’s format is due to some custom instructions I’ve been trying, seems to work better but I’m not really sure about accuracy.

Accuracy

Completely wrong
Mostly wrong
Mostly right
Completely right

0 voters

yamitenshi · September 18, 2023, 10:20am

I’m not sure it interpreted your question correctly. When you say “padding around the RGB triples grid” that makes me think padding outside the pixel array - its answer however is purely about padding inside the pixel aray, at the end of each row.

Is that what you were asking about or did it answer the wrong question?

In case that’s unclear:

The file format is essentially

Metadata
[padding]
Pixel array
[padding]
Metadata

And the pixel array is essentially

Row of pixels
[padding]
Row of pixels
[padding]
Row of pixels
[padding]
Row of pixels

Extended as far as needed, of course.

Where I interpret your question as you asking about the padding in the first block, but the actual answer is about the padding in the second block.

mariodesu · September 18, 2023, 10:35am

Yes, my question was about the pixel array!

I thought it was more something like:

[padding] [pixel triple] [pixel triple] [...] [padding]
[padding] [pixel triple] [pixel triple] [...] [padding]
...
..
.

Where the padding is only added if the number of pixels in a row is not a multiple of 4 (for example, in case of a 3x3 grid).

yamitenshi · September 18, 2023, 10:44am

Ah right, in that case the answer is about the correct padding and it checks out to me.

The padding is added to the end of each row, for the reasons GPT mentions - every row must start on an address that is a multiple of 4 for efficiency reasons, with no padding at the beginning, only at the end. So you get:

[pixel triple] [pixel triple] [pixel triple] [...] [padding]
[pixel triple] [pixel triple] [pixel triple] [...] [padding]
[pixel triple] [pixel triple] [pixel triple] [...] [padding]

And so on.

eagleflo · September 19, 2023, 8:30pm

I’ve been using ChatGPT to fill in on nuance but with answers like this I’m afraid it’s just confidently incorrect most of the time.

I could have sworn it was better before.

Belthazar · September 19, 2023, 10:32pm

I love how it doubles down on ほぼく, but when you start to go “… are you really really sure of that?” it falters, and decides to just erase the whole reading from existence.

Heh, it summarised a long text and decided to omit some details.

mariodesu · September 20, 2023, 5:04am

Is that 3.5 or 4?
Anyway, in my experience it shouldn’t be trusted on that type of questions (especially), regardless of the model…

WeebPotato · September 20, 2023, 11:11am

Ouch, this one’s pretty bad. I haven’t seen it make mistakes like that before. @mariodesu I think we used to get better results on questions about kanji readings, right?

Sounds almost like a Seven Ocean translation of School of the Elites, lol.

mariodesu · September 20, 2023, 11:21am

I think so. I keep seeing posts on 𝕏 from people claiming that it got performance downgraded, I think it’s possible, also because I imagine they’re prioritizing safety of use over accuracy

WeebPotato · September 20, 2023, 11:31am

I guess that’s fair. But if the model can’t do simple dictionary look-ups + extra context, it becomes significantly less useful for these types of questions.

Not that it should be used for such in the first place

Arzar33 · October 4, 2023, 3:56am

I got access to the new multimodal GPT-4V today, which can accept images as input, so I tried to give it a screenshot of a N1 mockup test. Unfortunately it made so many errors in scanning the text that it couldn’t answer correctly…

My guess is that GPT-4 could pass the N1 easily, but it’s a pain to enter manually an entire test, so I waited for the image version to try

mariodesu · October 4, 2023, 4:55am

That is weird, I think it has to do with the training data it was trained on because atm it can do some amazing things

How did you access the image function? I got access to the voice mode by using a VPN which was able to speak in Italian even before the language support was added haha

@Arzar33 would you mind me sharing that screenshot on twitter?

Anyway, recently I realized that the only reason GPT4 is still as inaccurate in foreign languages like Japanese (and it occasionally produces some senseless sentences in Italian as well) is because the current models are not trained on enough data, or that’s my guess

Topic		Replies	Views
GPT for vocabulary and grammar explanations Resources	23	578	February 1, 2025
The ChatGPT Thread Wiki Japanese Language	37	1489	June 9, 2024
Practicing Japanese with ChatGPT Resources	21	5014	March 14, 2024
Elon.io coverage depth Resources	11	730	April 28, 2018
Thoughts on Japanese test4you? Japanese Language	21	1143	May 31, 2023

Testing GPT-4.0 accuracy rate on Japanese language

Related topics