Thank you for the links.
I would personally throw out the study from 1955 and likely the one from 1971 as well, as in the world of linguistics those are honestly like a lifetime ago. But, rather than the quality of the study, I suspect the formulas themselves are significantly different, and that accounts for the large difference between 33k and 50k. The difference between 33 and 34, and even 32 is basically nothing since these are all done in the same way. “Have people see if they know X words” and then extrapolate from there.
You also bring up an interesting point. I could say, know a shit ton of archaic vocabulary, but in the modern world that’s just useless. Like, if I know the entire sentence “Hwæt. We Gardena in geardagum, þeodcyninga, þrym gefrunon, hu ða æþelingas ellen fremedon.” That might be more vocabulary, but it’s literally useless outside of a very specific context.
What I’m doing right now is going through the studies to see if I can find how words were defined. But as the other posts said, the problem here is still that “word” is not a singly defined concept, even in Japanese, what is one-word is complex. Like I said, are する and した two words or one, is 私は different 私? Is は a word? These are all the kinds of questions that need to be answered. And if you say は is a word, it’s an entire class of words that English doesn’t have. So already from that it’s an unfair comparison.
And yes, to the other posts, none of this actually matters to OP. This is a more theoretical discussion.
