Lexical complexity statistics of LN's

Here’s an interesting stats site about the top 300 narou stories. Click on the headers to sort.

‘Narou stories’ are webnovels from the site 小説家になろう, from where all the most popular novelizations come from, such as re:Zero, Shield Hero, Honzuki etc.

The explanation for the metrics is on the original VN analysis site.

Some picks (go to link for full table):

title rank kanji (unique) kanji (2+) chars lexemes sentences lines chars /line chars /sentence hours estimate sjis bytes freqlist 90% Target freqlist 92.5% Target freqlist 95% Target metric a metric b metric c metric d
本好きの下剋上 16 2596 2422 5481953 3025174 183486 99850 54.90 28.66 190.90 11434318 4195.94 5587.36 8053.06 77.69 78.01 78.18 77.43
Re:ゼロ 7 2815 2679 5342415 2958571 207915 120968 44.16 24.15 185.35 11154352 6207.20 8104.20 10891.09 79.30 80.09 80.07 79.36
盾の勇者の成り上がり 12 2592 2421 3692830 2172051 194961 142349 25.94 18.04 129.96 7845555 4229.02 5739.47 8135.72 80.56 80.01 80.00 80.57
くま クマ 熊 ベアー 77 2154 1986 2337879 1319434 118292 81983 28.52 18.86 79.35 4971434 2480.53 3300.24 4804.93 81.98 83.00 81.78 81.69

I’m surprised 本好き has a longer reading time than re:zero, even though I found it a magnitude easier. The hours estimate is kanji/18000 + hiragana/31500 + katakana/61400, which doesn’t reflect reality that well. Kanji are sometimes faster to read than hiragana, and of course it can’t take into account grammar.

I’d take the metrics with a grain of salt, but hopefully these can be useful when looking for level approppriate material. I guess the freqlist 92.5% target is the most useful metric as a whole to compare two texts.

The list is a bit hard to browse without images, but any thoughts how the stats reflect your experiences? Feel free to leave some recommendations as well! Shield Hero for some relatively easy isekai? :upside_down_face:

11 Likes

What’s a narou story?

2 Likes

A trashy story according to jisho :stuck_out_tongue:

4 Likes

I’m having trouble finding the stats for the light novels from the site you linked to. :slightly_frowning_face:

2 Likes

Sorry, I messed up the links. The stats link is now correct.

@Belthazar @Arzar33

‘Narou story’ is a webnovel from the site 小説家になろう, from where all the most popular novelizations come from, such as re:Zero, Shield Hero, Honzuki etc. come from. I should have clarified it better; adding it.

Basically it’s a site where anyone can start a series, and the best ones get serialized and published as LNs.

So I guess you’re right :joy:.

5 Likes

Based on my experience, I will probably need twice the time indicated to read the entirety of 本好き, considering I’m averaging at ~10h per volume. Well, that being said, the published volumes have extra text compared to the web version, so it’s probably a bit less than twice the expected time for a native.
I’ve also found that Re:zero is a more difficult (and thus slower) read than 本好き.

2 Likes

Huh! I knew that sometimes a popular webnovel gets a proper release, but I didn’t know it’s was so widespread.

2 Likes

I think there is so much more to the difficulty of the book than just the number of difficult/rare words in it, but I guess this method has it’s place.

Well, the only LN, apart from 本好き, that I’ve read from the list is 薬屋のひとりごと with freqlist 92.5% target of 9748.32, which is in line with it’s actual difficulty. It is quite hard.

But then, I can imagine some person puts something like とかげ from 吉本ばなな through this method, sees that it’s supposed to be rather easy, starts reading it and then is surprised at how difficult that book actually is.

2 Likes

Sure, that’s why I don’t say it’s useless. It’s just one metric that goes into the difficulty of a book.

Provided that the reader has sufficient grammar knowledge that is. In my experience, looking up a grammar point, especially a basic one, is sufficiently more difficult than looking up a word.

1 Like

Yeah, feels like almost all the webnovels on the list have had a proper release of some kind. At least everything I googled I found book cover art.

@yukinet Heh, zip compression :smile:. Where did you extract the text files from?

@Naphthalene Interesting! The reading time really seems to be something not that thoroughly thought out in the stats. Wonder if there is some better algorithm (which of course would have a hard time to take into account grammar).

My experience as well. The easier LN’s have pretty simple grammar, but the amount of vocab can still be daunting.

2 Likes

Does a smaller rank number mean a series is harder to read?

I’d say it’s quite fun, and I honestly love the series. I’m planning to start buying the LNs once I have more time to dedicate to Japanese. (So far, I’ve mostly followed the anime and read LN translations.)

If I’m understanding the rankings correctly… yeah, I guess Re:Zero seems a bit harder if I compare the anime adaptations of the two? Re:Zero has more world-specific vocabulary, especially with regard to ranks and positions. Shield Hero’s lexicon is more game-related, so it’s easier to work out, provided you know gaming terminology. However, from my experience trying to read Volume 18… if you have nothing but N3-N2 grammar knowledge and you’re used to textbook sentences, it’s going to be tough at first. LNs in general use more descriptive vocab, meaning compound verbs and idiomatic expressions are more common. Also, Shield Hero in particular was my first time encountering complex (read: multi-level embedded) relative clauses. Those might be hard to parse for a first-timer. I have lots of trouble getting through the first few pages after a year of Japanese with roughly N3-N2 grammar knowledge. I tried again a few months ago (about a year later), and it was much easier. What I did in between: a little Tobira and a lot of serious anime study? I’m not sure what helped exactly.

2 Likes

Yeah, I was wondering if you somehow crawled the text from the web novel. But alright, so it was from the e-book version :+1:. Would be interesting to get the original narou texts, the author does not include them in the github repository (although they have a note that they could upload them somewhere if requested). Alternatively I could just write a script myself…

1 Like