Potential for Hyper-Individualized supplemental review materials via Wanikani APIs + GenAI?

Hi All, as I grind through some of the lower levels of WaniKani, I have been seeking out WK-level-specific reading supplements and evidence based approaches to improving my skills faster.

Have others experimented with using GenAi to create reading passages just for kanji reading practice use? I’ve been thinking this is one way I can practice the specific Wanikani words I am acquiring / reacquiring - while interacting with Japanese Newspapers/Novels/books provide different learning benefits.

One key problem I’ve experienced in past years when learning Japanese is the matching problem - how can learners exactly match themselves to relevant reading practice using just words they know, alongside grappling with their preferred Japanese media [Social Media, Manga, Art, History, Wikis, etc.]? As posters and the community demonstrates, there is no issue of content shortage. It’s more like a needle in a haystack. I’m all for challenging myself and grokking with material outside of my comfort zone, but I’m realistic that it’s not the most efficient way to cement recent WK gains.

I would be interested in seeing completely individualized recommendations. One thought experiment: For somebody who used Genki 1/2 and is now at wanikani level 23, what is the perfect recommendation for easy and challenging reading passages that specifically reinforce the exact words and kanji they know? I’d prefer those individualized recommendations, but I have yet to be pointed to a service that exists for that.

Though they may not be the most artistic prose, I’m interested in applications for specifically practicing reading with additional sentence context, separate from my broader immersion efforts, which are the primary effort. If folks have examples of reading passages that exactly match Wanikani levels - or scripts that can pull your weakest kanji/vocab and have applicable passages for them, it would be great to know.

Additionally, even though Japanese adoption of GenAI is lower than other countries, it is still growing with the billions invested in it. It feels realistic to think that identifying and interacting with AI generated language [German, Spanish, English, Japanese, you name it] is also a part of interacting critically with the language and strengthening media literacy skills. At least, it’s a topic that educators globally have been grappling with since the introduction of the technology.

Curious if others have had thoughts on this. Bite-sized passages like the one below feel in the similar ethos as the rest of the WaniKani system, and was from a simple request - not even implementing scripting best practices or leveraging the current strengths of the model I chose.

  1. 山の上の小さな家 - 春の足音

山の頂に、ひっそりと佇む小さな家があった。そこには、老いたるおばあさんと、瞳に星を宿したような孫娘のりんが住んでいた。りんは毎日、山を下り、麓の学校へと通った。学校からの帰り道、りんはおばあさんのために野の花を摘み、山の小鳥たちと歌を交わした。家に着くと、りんはおばあさんと共に夕食の支度をし、暖炉のそばで物語を読んだ。

ある春の日、学校で「将来の夢」という作文の課題が出された。クラスメイトたちは、都会での華やかな生活や、有名な学者になることを夢見ていた。りんは、窓の外に広がる山々を見つめ、静かに考えた。彼女の心には、おばあさんのように、この家で季節の移ろいを感じながら、静かに暮らしたいという思いが溢れていた。

りんはおばあさんの横に座り、おばあさんの手を取りながら作文を読んだ。おばあさんは優しく微笑み、りんの頭を撫でた。「りん、お前の夢は、この山の春の息吹のように、温かく、そして美しい。お前の心が望むままに生きなさい」とおばあさんは言った。りんの心は、春の陽光のように温かく、満たされた。

I expect there will be a pushback around the question of authenticity of the resulting language - yet it seems shortsighted to completely ignore the role that AI generated text will play in the future whether we like it or not. Gen AI 日本語 has been becoming more common, and is another part of the native experience that might be worthwhile understanding.

After all, to my chagrin it’s already being integrated into many experiences I would rather not see Gen AI in, and I have extreme skepticism about the technology writ large. But if we’re being honest, many of these sentences above beat out the Beginner textbook example sentences that my Japanese friends have completely laughed at, or literal typos that existed in past learning materials supposedly proofread by my various teachers. I have enjoyed testing the limits and finding direct faults in Gen AI - but it’s clearly been improving and it’s not going away tomorrow just because my peers and I want it to. Since the community already is sophisticated in terms of scripting and reshaping these educational tools, it feels like a logical extension that some devs here would consider the implications of new tech.

I hope that folks will lean towards constructive conversation here rather than anon rage, frustration, or refutation over the technology - if you do that, you’ve probably missed the nature of the conversation I was intending to bring. This isn’t about replacing real immersion with GenAI garbage - it’s about targeted supplemental reading that meets each learner where they are at to effectively enhance their learning journey and engagement with their chosen Japanese works. Looking forward to a good discussion!

2 Likes

Having used GenAI for ~ two years now, I can positively say that I would be careful having AI write literature, especially if you trust it 100% that it will deliver the context of the language with perfect accuracy.

In my experience, what you ask and say to the AI, is hugely context sensitive and if there is an ounce of ambiguity, the AI will assume to know what you’re asking. When I use AI I spend a lot of time making sure the AI voids any assumptions which requires a lot of back and forward. There have been more times than I can count where the AI will fully understand my request, but as I may have used the wrong word to define something or the AI makes an assumption, it comes out incorrect, and really incorrect.

Now because of that, you’re having it write out literature to read. That’s great and powerful, provided that you can understand what it’s saying and are confident that its accuracy is correct and that is how the communication would be delivered in the context of English to Japanese.

I would be careful using AI as a learning tool at this stage. AI can make a lot of jobs easier. But it is not a replacement for human creation. It still requires a huge amount of hand holding. It could lead you to learn incorrect use of communication and reading.

If you accept that as the risk, then more power to you. With the experience I have using AI, I wouldn’t trust it right now to support my learning. I only ever use it for learning when I don’t know enough about something and I need it to point me in the correct direction so I can go to the source and learn it myself. So to answer your question, and I’m sorry to be that guy. “Perfect At-Level Reading Material Creation?” No, not even close.

6 Likes

Graded readers, Satori Reader, etc will serve you far better as something verified by a human for accuracy and designed intentionally for learning purposes, with all the benefits of not using genAI as bonuses.

Honestly if someone could work through this block of text you posted I think it’s silly to do anything but encourage them to get started on the easier end of real Japanese anyway, though.

6 Likes

GenAI is not good to use in circumstances where the user cannot reliably detect errors

13 Likes

This specific fragment is not half-bad as reading practice, I think.
Only two bits give me a pause:

学校へと通った

I wouldn’t think へとかよう is correct. You can use へと in many circumstances, but here?

「お前…」とおばあさんは言った

This お前 breaks my mental image of a kind granny, but maybe it’s valid.

If in exchange for a couple of errors you can get an instant, fresh piece of practice text to read that only uses vocabulary you already know, that could be a useful exercise.
Imagine you feed it an API with all your words currently due for review, it generates some sentences, and you review your words in context instead of in isolation. Maybe with such tool I wouldn’t have a 3.5k long review backlog.

The issue of error detection is much more prominent with LLMs, but for example novels also have errors even after going through publishing. Even more so with web novels and reading random online posts. Even WK and Bunpro example sentences have errors. Eventually one has to learn to detect them.
Graded readers and other texts written specifically for learners have the same issue of stiffness as LLMs.

But personally I wouldn’t use it that much though. I’d rather spend time reading a novel and have a laugh at a joke a human author made, then at exactly the same joke LLM made.

2 Likes

I think this is fundamentally where the desire for GenAI use comes from, and I just want to share the view that it is an entirely self-defeating way to approach getting your Japanese practice. I know we’re largely in agreement in the end since you mention choosing a novel so I just want to say I’m not jumping on you specifically, but it’s a useful jumping off point for this idea.

For one, what’s most useful is learning the most common things first, which you’ll obviously get through reading because they are, well, common. Hypothetically personally tailored Japanese is an obstacle in the way, a delay in developing this understanding. And anything that recurs quickly across more than one piece of writing is of more importance to you than a WK wordlist or wherever you’re sourcing.

But also, across every piece of written and spoken Japanese I’ve ever encountered, the major thing they have in common is that they were not designed around me. The experience of encountering things you don’t know will take many years to stop happening all the time. I think it’s fair to say that learning to deal with that, both not getting hung up on and then managing to learn something from the new unknowns, is a skill itself that you practice that is as important as any piece of Japanese language knowledge. To try to generate artificial things without any rough edges is, to me, an act of making your own practice substantially less valuable.

8 Likes

Not using for generating content but for easing comprehension or to summary context usages around a kanji readings. It helps me after my visual novel reading practice to get some missing pieces, helping managing the frustration of slogging through the text.

If you have this tool at your disposal why not use it to further your practice along with the Japanese Newspapers/Novels/books ? Am I missing something ?

1 Like

I think there might be a little overestimation in the time I would use any form of this generated content.

As Ka5 also agreed upon, this thought exercise is purely about generating strings of coherent words that just test unique readings of the vocab and kanji - exactly at the level that the WaniKani APIs could inform me of. This is not about developing prose that I base my understanding of Japanese around or an effort to distort natural language. It’s merely a ruthlessly efficient and sometimes funny reading passage leveraging exactly the vocabulary I’ve acquired, tailored to my specific needs in a way that other learning resources simply aren’t set up for. It also feels like a natural extension of the mnemonic concept used throughout WaniKani, and a nice way to recap a lesson after leveling up - just adding mere minutes to reinforce the readings even further.

But make no mistake, this is only a 1-4% supplement to the actual end goal - consistently immersing in Japan’s most influential and domestically popular literary and visual arts. I completely agree that the majority of time to spend is in engaging with challenging material that you have a stake in learning about and feel motivated to struggle and persist with.

Since many folks here seem to use the APIs I was just curious if others have ever considered the potential to make Wanikani’s impact even more powerful for the community by leveraging continuing improvements in LLMs.

1 Like

I think there might be a little overestimation in the time I would use any form of this generated content.

I think you’re missing the point. It’s that using it, period, may lead to errors in how you comprehend future text. The only people that can fix the problems with the generated text who can take advantage of this, are people who already have all their bases covered. Therefor they don’t need it anymore as an aid to assist learning. GenAI isn’t as sophisticated like a lot of people believe it is, and often blindly follow it as a competent resource. You still need someone to pilot it. LLM’s can do a lot, but they cannot fix their own mistakes without being told to and often go down logic rabbit holes when attempting to do so. Using endless assumptions and contradictions, and sometimes just lying. Yes… LLM’s can give false information when asked for help but actually don’t have the ability to. I can give you logs (plural) of Claude doing exactly that if you request it.

this thought exercise is purely about generating strings of coherent words that just test unique readings of the vocab and kanji - exactly at the level that the WaniKani APIs could inform me of.

Sure but you’re not talking about it being a pure thought exercise, you’re showing practical application while justifying it as an advantage and overlooking LLM’s clear flaws. Your text shows that it can provided an additional supplement resource to learning Japanese. The percentage of value it adds is negligible to the percentage of error risk that can happen. GenAI is a tool, and like all tools, it requires a specific task. Otherwise it will be ineffective. A tool for learning is currently not one of those things.

But make no mistake, this is only a 1-4% supplement to the actual end goal - consistently immersing in Japan’s most influential and domestically popular literary and visual arts. I completely agree that the majority of time to spend is in engaging with challenging material that you have a stake in learning about and feel motivated to struggle and persist with.

As people in this post have said. If any of what you’re learning is wrong, then it may lead you to misinterpret future text. It may seem minor but you don’t know what you don’t know. One problem can compound and unlearning something is really, really difficult.

If you want to use LLM’s, then go for it. But I don’t know that anyone really finds that there isn’t enough resources online to learn without generating text through AI. There is almost too much available and sometimes feels like you’re going to make the wrong choice picking one resource over another. This feels like an answer to a problem that doesn’t exist.

4 Likes

I think you are overestimating the “every piece ever encountered” part a little, and that in turn discounts potential value of other tools. Even absolute beginners in their second month of learning can run into input that is so trivial it’s a waste of time, e.g. the infamous これはペンです.

If the source of contention is “only uses vocabulary you already know”, then let’s reformulate it as “uses mostly i+1 sentences, with 80% focus on words that are currently due in your personal SRS reviews”.

Maybe not the best analogy, but sportsmen spend some of their time in a gym doing isolation exercises, even though there are negative side-effects of muscles stiffening and learning something unhelpful. And they spend even more time on specific drills instead of just doing their sport every available moment. There is extra time efficiency to be gained from using tools with narrow focus. Just like sports world spent many years figuring out the settings, perhaps language learning world can figure that out too.

1 Like

While I don’t think reading AI generated Japanese text will distort and ruin your language ability or anything, keep in mind that most models essentially operate in English as their primary language, and what you get is similar to generating text in english and running it through a google translate or DeepL. (That’s not precisely what’s happening under the hood, but most of the training data is in English).

“Translation” quality is not abysmal, and for pure kanji recognition practice it’ll definitely work, but I agree that you’ll have to go out of your comfort zone and engage with native material - the sooner you do it, the better.

With Japanese learning I frequently use AI with a “break down and explain this Japanese sentence” prompt, and most of the times it works fine, although recently I’m encountering more and more cases of complete nonsense as output.

2 Likes

Thanks for the post Zaichiki - this is exactly the sort of discussion I was hoping for.

The use cases seem quite narrow – pure kanji recognition practice, or using the “break down and explain this Japanese sentence” prompt. I’m also interested in its ability to be like Wikipedia, in providing simple explanations in Japanese of certain concepts. Google search is unfortunately already being coopted for this purpose

Interesting to hear that you find that the model output appears to have become less accurate over time

Wow, I really like that analogy around the isolation exercises since I think that’s at the core of the objective - provide enhanced training specifically meeting the needs of the learner - geared towards improving language acquisition and mastery in other contexts.

1 Like

That’s very well put. I’ve been using ChatGPT to help with programming tasks and I found it useful to quickly generate boilerplate code, but it very regularly outputs code that doesn’t compile or, worse, code that does compile just fine but is subtly broken or hopelessly inefficient. I shudder to think about the mountains of terrible code that is currently being generated and put into production by novice devs copy/pasting AI code blindly. And then AI is trained on that bad code and the circle of garbage continues…

ChatGPT for checking Japanese translations and breaking down sentences is pretty good and you can always double-check the meaning of the words and grammatical constructs by searching online if you suspect that it might be wrong. I wouldn’t trust it for anything beyond that. There are so many good sources of simple Japanese for beginners, why risk it?

2 Likes

100% it’s just a pure convenience using these systems. Great for prototyping, bad for actual production.

Always people who want to take the path of least resistance instead of rising to the challenge. It’s just avoiding the growing pains we all deal with it. AI a silver bullet currently, ahhh I think not! xD

1 Like

Hi Kgw - thanks for joining the discussion! You completely mischaracterize my intentions. This is not a Gen AI fanboy post, just thinking about technology and eventual applications [whether now, or 10 years from now]. The only one referring to AI as a silver bullet is you! xDDD

Also, the focus here is not complete beginners but those with mixed experience in Japanese – in this case, Japanese Canadians who grew up speaking some Japanese in the home that have a good grasp of grammar through exposure, but haven’t learned Joyo or studied Japanese in university settings.

Thinking for a minute here you can see how this would make the reading-matching problem more complex than standard-definition beginners, and why this topic arose in the first place.

I’m glad you’re having fun starting your Japanese studies :grinning:!

2 Likes

I’m enjoying the conversation, it’s always interesting seeing AI discussed through different lens. Although I guess the topic has changed a number of times since the original post, it’s so hard to keep a track of everything!! I’m sure you can totally understand that the constant shift in conversation makes it difficult to know what the conversation actually is about so natural, just like GenAI, getting easily confused with what is said and what is being meant is the irony of the conversation! Like copy and pasting text in google translate over and over until the text never reads like the original text conveyed. I’m glad we’re both on the same page now. Good luck with your journey as well, and hopefully you get the answer you most desire. :smiley:

Can honestly say I don’t think there’s a point in making things “targeted” at a users specific level, because there’s already tonnes of native content out there. If you can’t read it, you need to target the specific things you can’t read, and build up your abilities to meet it.

The goal of learning a language, for most people, is natural usage and understanding. So any materials to assist learning or gauge progress should come from real sources, human-authored natural passages, whether that be novels or just social media posts and discussions.

There’s no point trying to computer generate a passage that has a specific percentage of things the reader will know, to test their comprehension, when whipping out something like a manga or passage from a graded reader that already exists suffices. Those naturally sourced things are the real benchmarks to aim for and go by.

1 Like