Awesome! That’s super helpful, cheers.
In terms of how I know, I know a large amount of people who’ve reached high levels of proficiency in Japanese, Spanish, medicine, and other areas using self-rated spaced repetition systems. Jeopardy champions like Arthur Chu and Roger Craig have also dominated the show using programs like Anki. I think there’s also plenty of online Anki success stories out there-frankly, I’m surprised to see someone in the online Japanese-learning community who doesn’t swear by it!
In terms of academic research, there’s solid evidence that SRS systems contribute heavily to learning. This [meta-analysis](https://www.gwern.net/docs/spacedrepetition/1999-donovan.pdf)
, for example, found an effect size of d=0.42, which is pretty considerable! However, of course, this study pertains to spaced repetition, not self-rated spaced repetition, so we should probably get more specific!
I’m aware of two good literature reviews on the subject in the classroom, and they both come to the same conclusion:student rating moderately-to-highly correlates with teacher rating, but students will still often over- or under-estimate their own abilities, especially when in lower years of school(possibly a proxy for inexperience in their discipline).
However, this research doesn’t come without qualifiers! First is that both analyses sharply criticize the studies included for a variety of very basic methodological flaws, which may undercut the reliability of their findings a bit. Secondly, as Boud and Falchikov indicate, if an accurate assessment is somehow incentivized, people will assess their performance more accurately. This may mitigate some assessment bias in the field of Japanese-After all, everyone using Anki for Japanese has the goal of learning Japanese, which incentivizes them to grade themselves as accurately as possible so their learning is more efficient. These studies also implicitly assume the perfect accuracy of teacher grading, which has in fact been shown to be inconsistent and idiosyncratic.
Lastly, I don’t think students in a classroom is necessarily comparable to a learner using an SRS! The students in these studies were 1.attempting to summarize their performance on complex, multi-question tests and 2.did not have access to the answers. They’d be asked, for example, to rate their own performance on a math test before getting the mark back. In contrast, an Anki user knows the answer the moment they flip over the flashcard, and has a comparatively far simpler time deciding whether that answer they held in their head fits. The fact they can direct check the answers while the students in these analyses couldn’t is big enough, let alone the gap in complexity!
I don’t think self-assessment will get us far due to the difference between Anki and a typical classroom environment, so I took a look at the actual literature on SRS software, which seems to be a ringing endorsement. For example, this study found that use of electronic, self-rated flashcards through Anki increased the pass rate of the bar exam by a whopping 19.2%! Another study found that use of Anki was associated with significant gains in terms of L2 acquisition, even in a group that actively disliked using the application. L2 acquisition was also seen to increase in this study of Japanese students taking the TOEIC.
Of course, this doesn’t necessarily mean that self-rating doesn’t do worse than testing-it’s entirely possible that flash cards have a positive effect, but testing would have an even greater one! Unfortunately, I was unable to find a study comparing learning outcomes after using self-rated flashcards versus computer-rated ones, but I don’t think we have any reason to necessarily make it our null hypothesis that self-rating is an inferior way to learn. For a more detailed writeup on SRS and Anki, I’d recommend reading Gwern’s post here; it was my source for a few of the data I used here, and also a great read in general.