How accuracy affects review count (an excel analysis)

Greetings fellow followers of the Crabigator! :crabigator:

As we can all know, getting our reviews correct means we don’t need to do as many of them.

However, just how pronounced is this effect? How does a 95% pass rate compare to a 90% or an 80%?

Well I’m here to try and give you some kind of answer, using the power of excel spreadsheets! :nerd_face:

If you don’t like Probability or spreadsheets, but want to know the results, skip to the end, I won’t take it personally. :smile:

For those still up here, thank you! My feelings were really hurt by those who skipped past :cry:

First, let’s make some assumptions so that I can make an analysis work:

  • Your pass rate remains constant (Obviously false but we can still make some useful conclusions out of it)
  • Failing at App 1 will add one review, since you will need to repeat the App 1 stage but cannot be sent back any further.
  • Failing between App 2 and 4 will add 2 reviews - one from the previous stage and one from the need to repeat the stage you failed at.
  • Failing at Guru+ incurs 3 more reviews - two from the previous stages you are sent back to and one from repeating the failed stage.

Assumptions over, huzzah!

Now let’s get into the details.

If we have a 100% pass rate on an item, then going through all the SRS stages will take 8 reviews to graduate from Apprentice 1 to Burned. Easy! Done right?!

Unfortunately, we can’t have 100% pass rates on every radical/kanji/vocab, so we need to bust out our proper analytical skills.

For now, let’s assume a 90% pass rate and see where that takes us.
100% of reviews will start out at Apprentice 1, and for each step, 90% will make it to the next SRS stage. With this, let’s calculate the chance of making it past each stage.

App 1 App 2 App 3 App 4 Guru 1 Guru 2 Master Enlightened
reviews until burn
0.9 0.81 0.729 0.6561 0.59049 0.531441 0.4782969 0.43046721 8

At a 90% pass rate, we only have a a 43% chance of making past the Enlightened phase with the minimum number of reviews.

Now, what happens if - Crabigator forbid - you fail a review at some point?

In the simplest case, we fail exactly once at the Apprentice 1 stage and then get everything right from there, this is the only way to have 9 reviews of an item. 10% of items would fail at the apprentice phase, so the calculations are the same here, but with a proportion of 0.1 times as many making it through.

App 1 App 2 App 3 App 4 Guru 1 Guru 2 Master Enlightened
reviews until burn
0.9 0.81 0.729 0.6561 0.59049 0.531441 0.4782969 0.43046721 8
0.09 0.081 0.0729 0.06561 0.059049 0.0531441 0.04782969 0.043046721 9

at a 90% pass rate, 4.3% of items will need 9 reviews

Now it starts to get more interesting!
10 reviews could be had by failing in App 1 twice, then being totally accurate.
They could also occur by failing once between App 2 and 4 and otherwise finishing with complete accuracy.
I’ve added these possibilities by sending them down from previous columns two rows down and one to the left.

App 1 App 2 App 3 App 4 Guru 1 Guru 2 Master Enlightened
reviews until burn
0.9 0.81 0.729 0.6561 0.59049 0.531441 0.4782969 0.43046721 8
0.09 0.081 0.0729 0.06561 0.059049 0.0531441 0.04782969 0.043046721 9
0.09 0.1539 0.20412 0.183708 0.1653372 0.14880348 0.133923132 0.120530819 10

And so we have another 12.1%!

One more hard bit to go!
for review counts 11 and below, we can have different combinations of possibilities coming in.
Here will be the possibilities of:

  • three fails at Apprentice 1
  • one fail at Apprentice 1 plus one fail between Apprentice 2 and 4
  • one fail at Guru +

The Guru fail probabilities will be sent three rows down and two columns to the left.

App 1 App 2 App 3 App 4 Guru 1 Guru 2 Master Enlightened
reviews until burn
0.9 0.81 0.729 0.6561 0.59049 0.531441 0.4782969 0.43046721 8
0.09 0.081 0.0729 0.06561 0.059049 0.0531441 0.04782969 0.043046721 9
0.09 0.1539 0.20412 0.183708 0.1653372 0.14880348 0.133923132 0.120530819 10
0.0171 0.02268 0.086022 0.1305639 0.1653372 0.191850201 0.172665181 0.155398663 11

From here, I can extrapolate all rows down continuously until I’ve got the accuracy I desire. (image may contain spoilers)

I can then find the average expected number of reviews to burn by multiplying the proportion being burned in each column by the number of reviews in that column, then summing these all up.

Get in touch if you want a copy of the original spreadsheet, seems I can’t upload it directly.

TL;DR
By doing some maths we can work out the average expected reviews for an item by assuming a consistent pass rate.
By Varying the assumed pass rate We can find a few figures:
Edit: Note - pass rate refers to the amount of items passing onto the next stage, which can be quite different, (seemingly a fair bit lower) than wanikani’s provided accuracy rating.

Pass rate Avg Reviews
100% 8
95% 9.1
90% 10.5
85% 12.4
80% 15.1
78.6% 16
75% 19

Based on this, I’d say aim to keep those rates around 90%, as rates below that start to have greater effects on review count!
You may also like to note that you’ll need to do twice as many reviews if your pass rate falls below 78.6%!

edit: now with graphs!
image
The low probability at 9 reviews is expected, but the large drop in probability of 12 reviews for every pass rate is interesting! My rational is that the 12 vales are low because of the probabilities passing down via failed guru+ reviews which would add 3. Since being at 9 is already so unlikely, not many of these guru fails would occur.
(If you can follow train of thought here from that awful explanation you deserve a medal)

image

BIG DISCLAIMER
From what I can tell the numbers have come out looking fairly reasonable, however -
I very well may have made mistakes somewhere that make these results inaccurate.
If that’s the case, I’m sorry! They should still hopefully be relatively close!

15 Likes

A note on the percentages - this calculation takes only whole item as pass/fail. During review, it works a bit differently.

Lets say you are doing a review of 1 item. You get the meaning right, but fail the reading. The displayed accuracy will be 67% since you entered one bad answer and two good answers. Reviews count each entered answer separately. So 80% on review with two kanji may actually mean that one kanji goes up a level and one goes down.

So you actually need higher review accuracy to match the calculated result. Assuming you only make one mistake per item, getting 75% of items to go up means you need to reach review score of 89% (4 kanji + 1 mistake = 8 good + 1 bad answer = 8/9).

Or at least that is what I remember from the past reviews, I’ll need to check during next review.

I just checked my heatmap and the percentages are there, notice the difference between “Summary” and “answers”. I believe the second one is displayed by WaniKani in top right corner during reviews.
image

6 Likes

I debated whether to add it in the post as it was already long enough
I settled for simply calling it ‘pass rate’ since it would be near impossible to predict the proportion of reviews going to the next level based solely on someones review accuracy alone!

2 Likes

It changes the expected number of reviews quite a lot. Now that I think about it, getting about 90% review accuracy as displayed by WaniKani may actually lead to double the amount the reviews. And I can confirm that during days when I get frequently to 95-100%, it feels like there are hardly any reviews, but when I get a bad day and go down to 80% it feels like WK decided to spam me with reviews for next few days.

2 Likes

It’s a fair critique,
I’ve added a note after the TL;DR to hopefully reduce confusion.
Thanks!

Your definition of “spam” needs to be tweaked.

1 Like

OK, I’ll correct myself, then.

…but when I get a bad day and go down to 80% it feels like WK decided to bless me with reviews for next few days.

8 Likes

Also, a 90% pass rate doesn’t mean that for any given item there’s an independent chance of getting it wrong 10% of the time, but rather that for all of the items you have, there’s a specific 10% of them that you struggle with. :stuck_out_tongue:

Needs more graphs!

7 Likes

Love it, this is right up my alley. I think it’s important to note that this effect mainly affects workload at any speed, because people often conflate the two things. Also, as Belthazar points out, the probability of failure for an item isn’t uniform. Not only are some items “harder” than others, but one you really struggled with in apprentice might get learned so good the chance of failure later is near zero. And the other way around, one you sailed through without too much study early on might have a higher chance of failure after 4 months of not seeing it.

In any case, it’s clear failing a lot of items makes the work load substantially higher. If you’re having “too many reviews” problems, one thing you can do is try to spend more time on the lessons and early apprentice reviews.

Interestingly enough though, I made a simple model in excel a while back and while the effect of failures is noticeable, the effect of speed is much larger. That is, if your work load is too high, slowing down from 20 to 15 lessons a day has a lot bigger effect than a realistically achievable improvement in accuracy.

Here's another neat thing (oh by the way) -

If you’re pretty consistent in your lesson and review schedule, you can see (in a sense) your failure rate on the front page. The number of apprentice / guru / master / enlightened items should be a constant at steady state - every time you pass an apprentice into guru, you also pass a guru into master, net zero change in guru. For example, at 15 lessons a day, those numbers should be 60-75 or thereabouts (apprentice swings wildly) / 315 / 450 / 1800 and it’s purely a function of how many days each item waits. (It varies with lesson speed)

Almost all failures are most visible in guru. Guru failures go to apprentice, then you pass them quickly and they’re back in guru. Master goes to guru. Enlightened goes to guru.

If you look at your guru number and subtract 315 (what should be there), that’s a rough measure of your failures.

Great post ++ subscribing to your newsletter.

8 Likes

Yeah, this is definitely focused on the general amount of workload, trying to think about what this means for you over the next day or the next week or month would need a lot of analysis and might not go anywhere because realistically someone’s ‘pass rate’ varies so much based on current level and item.

Something I’ve also been thinking about! This came as a result of trying to estimate exactly how many reviews I might be doing if I were to eventually reach this ‘steady state’.

If my thought process is right, if you were to consistently add the same number of lessons every day, and (miraculously) have a consistent ‘pass rate’ (rate of advancing to next step) for those items’ reviews then the number of reviews you would expect should be found simply by:

(Number of new daily lessons)*(avg expected reviews for your pass rate from above)

So in the long run, going from 20 lessons to 15 would have the (fairly intuitive) effect of reducing reviews by 25%.
A similar reduction could come from going from an 85% to a 95% pass rate which would decrease expected reviews by 26.4% but this would be waay more difficult to achieve.

I’ll look at adding some graphs if I get some time this weekend :grin:

3 Likes

Nice work! Very interesting to see what a big influence the accuracy has!

Now, if you could also make a spreadsheet that would magically increase my accuracy, I would be forever grateful. :pray:

5 Likes