First: what is this thread about? Well, something I’ve seen regularly posted in the WK forums is that doing any reviews outside of the scheduled ones will actively harm your learning, which sounded strange to me. I spent much of today delving into research of the field, and the post below is my findings. There will be a tl;dr at the end of the post.
Let me give a disclaimer. I don’t do research in this area, nor do I know anyone that does. This is primarily just what I got from reading two meta-analyses and one suvey. If anyone is an expert in the field, or at least keeps up with the current research, feel free to point out any mistakes.
Before we delve into this, we need to introduce some terms. Massed practice is where all the reviews are done right in a row. If this was WK, you can imagine this as doing 8 reviews of each radical/kanji/word in a single sitting and then never again. Spaced practice in any system which adds time between these review sessions. Retrieval refers to correctly recalling an item to be memorized. Lag is the time of the interval between reviews.
The first thing I want to be clear about: what does the literature say about spaced practice versus massed practice with an equal amount of work? Everything essentially agrees that, barring the “test” being very shortly after the final review (ie: immediately afterword), spaced practice produces far more retention, with the benefits being pretty large (see this large meta-analysis, where 254 studies compared the two, with a difference of 36.7% retention on average compared to 47.3% retention). Meanwhile, when tested immediately afterwords, massed practice generally proves more effective (take this paper as a random example). There is a gradient for the trade off as lag increases, but its pretty safe to say that after a couple of days, any spaced practice, regardless of lag, will be outperforming the massed practice.
OK, so now we have the question: why is spaced practice effective? There are 3 relatively common answers in the literature. Note that these aren’t necessarily mutually exclusive.
-
The deficient processing hypothesis, first actually coined by this 1983 paper (though similar sentiments appeared in prior years), effectively places the effectiveness of spaced practice more on the failures of massed practice than its own success. Put as simply as possible, it states that after first learning an item, there is a period where learners won’t really process a second presentation in a good capacity. Thus, having lag allows learns more “good” review sessions, rather than nearly worthless sequential ones.
-
The study-phase-retrieval hypothesis (sometimes shortened to just retrieving or called simply reminding), introduced in an even earlier 1976 paper, simply states that it is the successful retrieval after large lag which integrates it into memory. As a result, it suggests that when an item is recalled after a long amount of time, it will be further embedded into memory. Conversely, the hypothesis suggests that when an item is failed to be retrieved the review is “worthless”.
-
The oldest hypothesis, generally tracing its routes to a paper from 1955, is referred to as the encoding variability hypothesis, or sometimes referred to as the contextual hypothesis. This is exactly what it sounds like. Massed practice tends to be in the same context (the same external stimuli, the same state of mind, etc). The general idea here is that spaced practice enforces studying in different environments which in turn produces better recall.
As I mentioned (and as should be clear), these hypothesis aren’t mutually exclusive. There are a ton of models out there that try to account for this in their own way, with varying degrees of success.
As for which of these, in their purest form appear to be true, the data seems to suggest that:
-
The deficient processing hypothesis is well supported by the evidence. However, it fails to account for the fact that there does seem to be a benefit for successfully retrieving over increasingly longer gaps.
-
The study-phase-retrieval hypothesis is also well supported by the data. However, it lacks a real detailed explanation of functionality, as it only gives vague intuition of how the retrieving may improve memory, lacking of a full detailed model and neurological explanation.
-
The encoding variability hypothesis seems to be mostly, if not completely, bunk. It clashes with much of the experimental data and neurological data we have. Both of which seems to suggest that the exact opposite is true (that repeated learning in the same context is more beneficial than in a different context).
For further reading on the subject of SRS hypotheses, I recommend this fairly recent meta-analysis on the subject.
Let’s look at the WKer claim of “reviewing more makes you remember less” through these lenses. We will break it up into “cheating” on apprentice levels, versus “cheating” on guru+ levels.
From both the deficient processing and study-phase-retrieval perspective, the first few apprentice levels are close to massed repetition and primarily exist to allow the user to be able to recall with large enough lag to produce “good” retrievals. So additional studying at this point will, at the very least, not harm user performance. The encoding viability hypothesis would suggest that additional studying may not be within a sufficiently changed context, but at even at the very worst, this would do no harm.
For guru+ levels, on first blush, the study-phase-retrieval hypothesis may seem to support the WK “cheating” claim. However, it is worth noting, from this hypothesis’ perspective, any failed recollection is essentially wasted time, so “failing” a topic in a review will at least give you a chance to actually get real learning for them for when you get to the actual review. And unless these reviews are happening particularly frequently, there is still going to be a fairly “large” recollection upon the self-started review or the scheduled one. And the other two hypothesis would both suggest that such a review, if at least days out from previous reviews, would actually be beneficial.
OK, so that is what the theories would say, but what about the actual data? Well, I can’t seem to find a single study that has attempted to answer the question “does reviewing more hurt recall” because no researcher seriously considers this. As initially italicized, the research is entirely focused on comparing the tecnniques when equal amount of time is spent. Researchers, to the contrary, are concerned with the exact opposite. Displaced rehearsals, which is where learners review earlier items on the list, consciously or not, happens more often in a spaced system. This improves the the learner’s performance. This remains a methodological flaw in much of the area’s research. For an example of this, see this paper, though a wider perspective is best received from the hypothesis meta-analysis I posted earlier.
I never had room to cite it, but this survey of the area was very helpful to me.
tl;dr:
Will having additional reviews hurt your learning?
No. So long as you aren’t cramming it minutes before the next review, more reviews can only make you memorize something better. The research clearly shows that reviewing more helps (and people accidentally doing it remains a problem in the studies). It doesn’t “go against the philosophy” like I’ve seen some people post. Spaced repetition is simply the most efficient way to memorize things, not the fastest. If one reviewed all 7000+ WK items every day for the same time someone went through it normally, there would be no question on who had better memorized the content. But us poor adult humans don’t have that kind of time, and so we instead try to learn efficiently.