SRS with Randomized Sentences From Corpora

As some of you might remember, I made a thread about SRS and why I stopped doing it a while ago. After all the responses I got, I thought about it a whole lot and came to the conclusion that the thing that bothers me the most is that it all relies on rote learning which does not mirror how we encounter words in real life. Of course it’s fine for words that work isolated, but many words cannot really be learned by just repeating the dictionary definition.

We also heavily rely on patterns when doing SRS, so even if there’s an example sentence, it will always be the same example sentence and instead “figuring out” the sentence again, we start recalling it from our memories at some point.

But what if SRS worked a little differently? What if there was a different context each time? This would simulate how we actually encounter words in real life.

Now there are plenty of corpora out there. I theory, it would not be difficult to get a randomized sentence (or sentences) from an online (or offline) source every time. But this approach obviously comes with several limitations, too:

  • the random sentences might contain unknown words, making it harder to figure them out
  • the sentences might be dependent on a larger context and not work isolated
  • searching for some words will yield false positives (especially short hiragana or single-kanji words)
  • the same word might have multiple meanings in different contexts

If there was a “curated” corpus, this would be a different story, but that’s so much work that it’s absolutely not feasible.

I still think this is an interesting approach, so I was surprised to find no information on the internet about anyone trying this before. (Though I have found cases in which someone wanted to display one random example sentence out of several options. Too much work, though, if you ask me.)

I would like to hear thoughts on it. Do you think this approach might enhance the efficiency of SRS, in particular when it comes to learning “language”, not just “dictionary entries”? Or do you think it would slow down the repetition process so much it wouldn’t be worth it?

4 Likes

Hum… I like the way you’re thinking.

If the SRS system had a way to memorize which words one knows, it could display only sentences where the user does know everything. I know that @neicul has been trying to figure out a way to do the known words counting in an efficient way on Kitsun.

This could probably not be a bad thing though, depending on how the whole thing is built :thinking:


I personally keep it simple: I read/watch something, see a word I’m interesting in learning, make a card out of it (post-exposure).

I do think that the best approach is the way that @Ncastaneda does: getting the content straight from the source (movie, TV series, etc) and turn it into a word/sentence card (?). Idk, tagging them as they might be interested in discussing it.

8 Likes

There is an app, Clozemaster, which approximates your idea. It uses sentences from Tatoeba , but there’s a lot of scope for improvement. The blanks are chosen somewhat randomly and there aren’t enough SRS intervals to be useful IMO.

5 Likes

I was seriously considering dropping the SRS routine once for all some time ago.

In the end I found out about resources to make cards somewhat on the spot while reading books in a Kindle and watching shows conming mostly from Netflix. That made me reconsider using the SRS for some time more, since it made the whole deal much more relatable to the content I was consuming, and overall it created a circle where in order to keep learning new words I need to consume more content as to find new vocab.

I was already using a routine (using the add-on Morphman) for within lines from shows search new words in a +1 manner or just to look for sentences containing known words to practice initially reading comprehension and now listening comprehension mostly.

In the end it came to my current routine where I will only get new vocab coming from exposure, and then have aditional listening practice making use of lines coming from my shows too. It was the best I could come up with with all the tools I’ve found out in this almost 2 years. Probably will keep the routine still for some time until droping the SRS eventually.

Setting a routine like this I must confess has taken quite some hours of my time, that looking back is something to be considered. But given the whole setup is been already put together by others now, probably it’s totally worthwhile for anyone interesting in this, specially if having trouble with other SRS routines. Is not exactly “plug and play”, but I think using your own material is all too stimulating as to keep you excited about learning with quality relatable content (compared to anything premade or sentence banks); also makes measuring your gap to your aimed undestading level quite more palpable while slowly getting there :slightly_smiling_face:.


There’re a lot of new resources that try to improve on the user experience and overall making it easy for you to review those cards. I think taking a look at making the content as engaging as possible depending of the person could be even more important and worthwhile. I wish there was a tool to make both sides possible (extremely easy to integrate with the consumed media + great UI / user experience).

The website and resources on JALUP has this, only thing is that it is based on specific content, which if I was was into it, I would have totally went for it :sweat_smile: .

5 Likes

Thanks for your input so far, everyone.

@elynchbell Clozemaster looks interesting in theory. But if definitely does not seem optimized for learning and definitely not personalized, though I’ve only looked at it for a couple of minutes. But it’s good to know that a large corpus like Tatoeba is openly and freely available.

@Ncastaneda At least for written media in your browser, there’s a customizable one-click solution to add a word, it’s definition(s) and it’s context (sentence or larger context, depending on preference) directly to Anki. It’s only an option on a PC, though. Almost too tempting to add too much stuff with a one-click solution. I think this is by far the most comfortable option (I mean, it can’t really get any less hassle-free), but the Kindle solution you posted also seems pretty good.

But the thing I actually made this thread is that I wanted to get away from repetition that eventually is all about rote memory. Example sentences are nice, your own example sentences are great, but right now I’m interested in an approach that gets even closer to how you would actually encounter a word “in the wild” and the way you think about it.

1 Like

If you are not interested in the repetition aspect of the SRS much, I think you can pretty much use Morphman to pick sample sentences from your shows (Subs) and use is just to provide iterations of your known vocab.

You can tweak the settings in Anki to really space out the review times, as to mostly benefit yourself of having the sample sentences using your known vocab without seen those lines even too often as to actually memorize anything. You can choose to use morphman too to learn new vocab of course, will throw you new vocab in a +1 manner, or you can simply add a list of words you know and it will just add it to the database of your known vocab, so it will use it to select sentences containing those words too.

I decided on using this tool specially for listening sentences and with shows I have watched before, because too much of the meaning can be missed in sentences alone. So if no aditional context is there, like if reviewing random sentences, you can only understand that much of a sentence.

I mention this routine with subs, because I like to have audio and some image in my cards, but of course the tool will serve you well if you decide to make a deck comming from a book too. I think you can find some resource to transform an ebook into standard text. Then separete each sentence and make a card out of every one. And then use morphman to cherry pick them for your current vocabulary to practice. :man_shrugging: … though I think I would eventually most likely prefer to just read the book :sweat_smile:

One of the reasons to keep this practice still in me case, where probably I could make listening practice be just about watching (specially with Vorcious) is that from time to time the sample santences picked up will be using a secod meaning of a word I’ve already added to my database or will use it in a way that it’s not exactly the one I saw in the first encounter with the word. This makes including those alternate meanings to known vocab something really easy to integrate and overall will help to round up the idea behind a word somewhat faster.

In any case I encourage you to look for the mentioned tool, I’m sure you can adjust it to fit waht you’re looking for. :+1:

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.