How to Gauge Reading Level of a Text (mostly children's novels)

I have a question about how to gauge the actual reading level/complexity of a text in Japanese.

I LOVE children’s literature… it makes up a large chunk of what I wind up reading for pleasure on my own in my native English. I’m trying to read more in Japanese, and I was just gifted (!!!) a bunch of middle grade novels from a Japanese mom friend of mine whose kids had outgrown them or decided they weren’t interested in them. They are all translations of English texts as she was trying to keep her kids interested in reading Japanese by giving her kids texts that their schoolmates in American public schools were also reading. I’ve read most of them before in English.

The majority of the books have no “suggested audience” on the cover or in the front- or back-matter that I can discern. The two that do say 小学校中級以上/小学中級から respectively but also seem like they are wildly different levels…

In English markets, the target audience of a middle grade novel/ YA novel is often based almost-entirely on the age of the main characters. The assumption being that children want to read about children their age or else slightly older, and have very little interest in reading about “little kids” once they are beyond a specific age. This stratification is far less true for Classic texts (which may use extremely rich vocabulary for a story about a very young child), but that is the way modern books are marketed. Modern books tend to therefore limit the “reading level” to the target reading level of that age group, but that can be quite wide (8-12, 5th-8th grade, etc.).

However, I’m noticing that this has seemingly VERY little bearing in Japanese looking across these books. Some books have tons of furigana, others a little, and others none at all. All from within the English “8-12” target audience level. I also can’t seem to just be able to tell at a glance whether the kanji included without furigana is 四年生 or 六年生…

Since I can’t tell JUST by looking at the cover, or just by discerning whether the book has lots of furigana or not… how do I tell? I mostly want to know this information so I can organize these books as I want to progress through them. So I’m moving slowly upward in reading level and not wading through a “reading pain” level book for me where I have to look up tons of words only to pass onto a book with maybe less furigana but much simpler vocabulary that I can just breeze through that might have encouraged me to learn some of the same words with much less pain.

I organized my books according to their Lexile Measure number in English, ranging from about 300 to 1100 (so about 1st-8th grade). That might be how I wind up reading them (advice appreciated). That’s relatively convenient as I can tell the complexity of the original vocabulary for these texts as they happen to be English origin books… but seriously… how do Japanese parents/students/teachers tell whether or not children are ready to read a certain text? I haven’t been able to find any sort of comparable information and of course vocabulary-level is only one piece of the puzzle as the number of kanji and whether or not those kanji have furigana must have a huge bearing on whether or not a child can attempt to read certain texts with pleasure and ease… wtf do kids do?

I also have 魔女の宅急便 which of course has many English translations so it was also easy to find the Lexile number for that, but I have 川の光 that I’m very interested in reading this year or early next but it seems REALLY hard for me right now. There is absolutely no furigana in the whole 450 page kids novel AND the vocabulary seems relatively complex… and yet I can’t easily find any information on 読者対象 or 読者レベル or anything of the sort… how do I figure out where to put it amongst the books I have?

As a frame of reference I’m currently reading 魔女の宅急便 and having no problem with it. I can read it easily and pleasurably and understand nearly every word and grammar pattern. However, when I try to get through a bit of Harry Potter and the Sorcerer’s Stone, ハリー・ポッターと賢者の石, I cannot even make it a few paragraphs in without needing LOTS of dictionary and kanji help. BOTH of these books would be considered about a 5th grade reading level in English, though the Lexile Measures are quite disparate (670 vs. 880).

Help!

3 Likes

Ok I don’t think I can cover all of your questions :sweat_smile: but I think I can share some information that might be helpful to you (or not - in that case, sorry!).

Kanji level: Japanese schoolchildren learn about 200 kanji per year (ok only 80 in the first and 160 in the second, but you get the idea), so they know lots of kanji really fast (after 4th grade they know 640 kanji, after 6th grade they know 1006 kanji, see Kanji Kentei - Wikipedia). If you want to know which ones, you can plug your API key into www.wkstats.com and see for yourself (and how many of which level you already know).

Furigana: Often, multiple editions are available, e.g. one with full furigana and one with furigana used sparingly. And then there might be editions with more or less furigana usage (e.g. see the different editions of Kiki). But I have no idea what this selection is based upon.

How can you tell the level: In the reading books I have that are aimed at 1st grade (and I think this also holds true for 2nd grade), there are still spaces between the words. This might be a way to separate those two fractions. But I don’t know whether this is a general rule or only applies to this special series.

Harry Potter: I’m not surprised at all that you have issues reading it - https://floflo.moe lists it as having 6940 individual words :sweat_smile: (granted there will be a bunch of misparses of names and such, but you get the idea).

On the other hand, don’t forget that a native child knows probably thousands of words, so it is pretty normal to encounter tons of new vocab in each new book (at least for me)…

7 Likes

Do you have a digital representation of the various texts? I’m currently developing a tool that performs morphological analysis of Japanese text and generates an index that allows you to compare one text’s semantic difficulty to another. It’s largely inspired by how Lexile scores are generated (placing weight on word frequency and sentence length). The measurements will be different, and only significant when compared to another text, since Lexile calculation methodologies are proprietary (IIRC).

2 Likes

@NicoleIsEnough, thanks for the response! I’ve studied all 1006 kyoiku kanji in the past and have great retension/recall of at least 600 of them (not sure how many more but somewhere between 600 and 1000). I’m starting WK to help with that retension and push beyond to joyo as my goal.

Problems when reading include: not knowing the jukugo (reading, as I can often get meaning from context if I know both/all kanji), meaning of general vocabulary with or without kanji I know, etc.

I do of course expect to learn quite a bit of vocabulary as I move along in a text and I expect to need to look SOME things up and retain other things through osmosis/context and exposure alone. But I think there is a huge difference between reading extensively for pleasure where I encounter new words through a story I’m invested in, and the ‘pain level’ where reading is torturous because I’m having to look up so many words I can hardly get through a sentence. I spend a lot of time thinking about how reading-aged children pick up vocabulary in their native language and there is a lot to be said for reading extensively at one’s level and slowly sprinkling in more complex choices as one moves up in level/ability/speed/comprehension on an individual basis. Kids are completely demotivated by text that it is too difficult for their level, but a love for reading is fueled and incentivized by free reading of books at their level, through which they learn and progress a lot (even when the books aren’t that great/sophisticated). I want to mirror that progression in many ways as I’ve seen what extensive reading has done for my French abilities over the years.

I have several novels and short stories for the younger set (1st and 2nd graders) that unfortunately don’t separate the words despite being written nearly exclusively in hiragana, so I’m afraid the pattern doesn’t always hold true, but I think you’re correct that by 3rd grade at least, it would be pretty hard to find content with spaces between words.

Yeah… I believe strongly I have the vocabulary level of a 6 year old at least. But it is with books that I hope to move my Japanese comprehension up and learn more new words… just not jumping too far ahead too fast. I think children go from 3000-6000 words and from 6000-9000 words largely thanks to the power of the written word after all, and largely outside of formal vocabulary or spelling lists.

Meaning an actual text like an ebook? Or would a scanned page/photo suffice?

That sounds like really interesting work! I’d love to know what comes of it if it is ever available for widespread use!

1 Like

Something where the text can be read / selected by a machine. If an image is clear enough, it may be possible to do text recognition with an acceptable amount of errors.

I took a break from development from the holidays, but my motivation to continue to project is returning so I think I’ll give it some more effort. Currently I can import Wikipedia articles and YouTube videos (with subtitles) fairly reliably. One feature I want to add is the ability to create “contexts” for users. I’ve found that the global word frequency list of the Japanese language is actually of reduced immediate utility for students because the vocabulary frequency in particular topics the student spends the most time reading and watching may not align with the global frequency list. The theory is that it’s actually of greater utility to calculate word frequency based on a user defined topic or collection of topics, then have them read / watch content with higher average word frequency (lower semantic difficulty) and lower average sentence length (lower syntactic difficulty) first.

Another important note is that there are various pitfalls to be aware of when using Lexile measurements of the English translations to determine reading difficulty of a Japanese text. Japanese can have many words for a singular concept in English, even if we’re only considering level of politeness and nuances (i.e. opening a book vs opening a window, ひらく・あける, or “I”, おれ・ぼく・わたし・わし・わたくし・あたし, etc.).

1 Like

Here are two random text-heavy pages from 川の光. Let me know if you can run these and if it works I’d gladly send more pages to test from more of the books I have.

1 Like

Nice, I’ll see if I can extract the text and get anything useful from it :slight_smile:

1 Like

Absolutely see the pitfalls. I just have no idea what sort of other measurement to use for myself. There MUST be ways that Japanese people make these judgements about books…

1 Like

I’m curious to see if they have a published system or index as well. Even though the Lexile framework was created in 1989, most of the literature and research I’ve seen on it has been from the last decade or so, therefore I’m not even certain that Lexile measurements were in widespread usage until more recently. It’s possible that a quantitative system doesn’t exist in Japan and kids just “brute force” it a bit more, especially considering the all-day year-round schooling (exaggeration, but not much) their students receive.

1 Like

Technically, so are full digital copies of books. :joy:

Although I wonder how much Japanese material the Google Book Archive has that aren’t about learning the language.

3 Likes

Even Lexile measurements are performed with only a sample (or several samples) of the target text (125-word excerpts), so a full digital copy isn’t necessary when taking a heuristic approach.

2 Likes

Have you tried looking the books up on a retail site to see whether they have different classifications there / are listed under different age categories?

I’m having a look on Kinokuniya and confess I can’t see anything beyond “children’s book”, but I suck at Japanese websites so maybe I’m missing something.

One thing that might help is that any books published under the Aoi Tori line (published by Kodansha) are aimd at elementary school children. They have a distinctive blue border. Books with a similar border but in bright green are from the Kadokawa Tsubasa Bunko line, and are definitely aimed at older kids than Aoi Tori. Of course, I guess the same English book could have been translated into different difficulty levels (I’ve seen multiple editions of The Lion, The Witch and the Wardrobe, for example), but it might give you an idea?!

It looks to me like there is a yellow border line with a similar aesthetic to Aoi Tori, so that might be a different age line from Kodansha, but I can’t read what it’s called at all.

I can understand brute forcing it for school… I cannot imagine brute forcing for pleasure/free reading.

Yes, this was my first thought. It is so different that the reading level is practically on every single bookstore page, amazon page, publisher page, book recommendation website… it is so easy to find an approximation of a book’s level in English… I get that we in the US are really hyper-focused on reading level, using it as a measure of standardized testing and our hope that every child at such and such grade level be at that exact grade level with their reading (not acknowledging the vast fluctuation in the way kids learn… it’s not a straight line equation but more like a staircase…). But like… Japan LITERALLY has their kanji divided by grade level and everyone in the whole country learns the SAME curriculum… we don’t do that here… even Common Core doesn’t go so far… you’d think they’d have things subdivided and even published in a way that easily reflected grade levels (“This book is typefaced at a 5th grade level. 4th grade kanji and below have no furigana…”). But yeah, I’ve poked around publisher websites, amazon, bookstores in Japan to try to find a place that just states “3rd-5th grade” or something, but I cannot find it anywhere… Plenty of sites do have their おすすめ sections by age, like ehonnavi, but most of the books I have aren’t on there, and it is really cumbersome to try to find a book by just looking at pages and pages of booklists for an individual age…

I’m still really frustrated by this whole thing. I’ve tried asking a bunch of my italki tutors, too, and they have no idea how to rationalize any of it.

1 Like

When I tried reading English books for the first time as a 12-year old, I decided on Tolkien! :joy: Let’s just say it was mostly painful to get through that first book. But, after a couple of months going at it, and looking up tons of words, I got through it. Second book took me about 3-4 weeks. The third book a couple of weeks.

Just as a point of perspective.

For reading in Japanese I have a similar approach; I would just go at it. If it’s too frustrating, drop whatever you’re reading and try something else. But, if you can slug it out, you’ll get a lot of help moving forward as you do so. That time spent looking up words is not wasted imo.

This is how I’ve played some Japanese games as well - with tons of patience. Others I’ve had to put to the side for the time being.

I don’t think too hard on reading level as I just try to be honest to myself about how much work I’m willing to put into something at the moment. :slight_smile:

2 Likes

I would probably just - and this is gonna be a painful approach :stuck_out_tongue: - read the first chapter of each one, and rank them roughly according to how difficult you found them.

That might be impractical depending on how many you have, but I guess the advantage is that the ranking will be based on your own ability rather than the difficulty as perceived for Japanese children (which will be much more about the difficulty of words and concepts, and less about grammar).

Otherwise I think you’re just gonna have to go for it and hope you don’t pick something horrendous first!

4 Likes

Ryosuke - I’d love to be updated on the status of your text analysis tool. I’d like to use it to judge the difficulty of video games by analyzing the video game summary text.

You inspired me to do some websearching and I found two great websites for this very thing: Japanese Text Analysis and Readability Tools | Kai Krause