# Monte Carlo Simulation: Total number of reviews to burn everything

### Background

I was curious as to how many WK reviews I should expect to do in total to burn every single item. I know that I could probably try to look it up somewhere, and I could definitely apply some of what I learned in my statistics class in uni to figure this out mathematically, but instead I decided to do a Monte Carlo simulation.

How many reviews you need to do of course depends on your accuracy, so what I did was go through a number of different accuracy intervals, and do simulations in each step to see how many reviews you would need before reaching the last SRS level, on average. I start the simulations at 61% because it gets slow lower than that (asymptotically so as you approach 50%), and I doubt many have such low accuracies.

The reason I do this is that I want to find out how far I have come, in terms percentage of reviews that I will need to do to burn everything.

### Method

The interval 61%-100% is evenly sampled at 40 points. For each of these 40 different accuracies 100 simulations are run where all items are reviewed, with a probability of success equivalent to the pointâs respective accuracy level, until they are burned. In each simulation the total number of reviews to burn all items is tracked, and the average over the 100 simulations is recorded.

Source Code
import random
import time

class Item:
"""Class for items to be reviewed. Only attributes are SRS level and how many cards the item has."""
def __init__(self, multiplicity):
"""Create a new item."""
self._SRS_level = 1                     # Don't count lessons
self._multiplicity = multiplicity       # Indicates how many "cards" you have per "note", to use Anki terminology

def review_item(self, p):
"""Evaluates based on probability whether the item passes or fails a review."""
# p**self._multiplicity is used here becuase a user has to pass both the meaning and reading review when multiplicity is 2.
p_observed = random.random()
if p_observed < p**self._multiplicity:
self._SRS_level += 1                # If review is successful item goes up one SRS level
review_count = self._multiplicity   # if it's a radical we did one review, else 2
else:
if p_observed < p:                  # If this is true then we failed one review and passed one
review_count = 3
else:
review_count = 2*self._multiplicity
if self._SRS_level != 1:            # Don't change SRS level is item is already at the lowest
if self._SRS_level <= 4:        # If item is an apprentice then reduce SRS by 1 level
self._SRS_level -= 1
else:                           # Else 2 levels
self._SRS_level -= 2
return self._SRS_level, review_count

def create_items(count, double):
"""Creates a hash of items to review."""
items = {}
for i in range(1, double + 1):
items[i] = Item(2)
for i in range(double + 1, count + 1):
items[i] = Item(1)
return items

def review_items(count, items, max_srs, p):
"""Reviews all items until they reach the final SRS level."""
reviews = 0
while count > 0:
keys = [key for key in items]
# Review all items once
for i in keys:
srs_level, review = items[i].review_item(p)
# If item reaches last SRS level remove it from the queue
if srs_level == max_srs:
del items[i]
count -= 1
reviews += review
return reviews

def repeat_run(runs, single, double, max_srs, p):
"""Repeats the same simulation a number of times and returns the average."""
total_reviews = 0
for i in range(1, runs+1):
count = single + double
# Create new items
items = create_items(count, double)
# Review items until all reach the last SRS level
reviews = review_items(count, items, max_srs, p)
total_reviews += reviews

"""Adds data to the discourse table we're making."""
data = "| " + str(accuracy) + " | " + str(round(total_reviews / runs))
if (accuracy-1) % 5 == 0 and accuracy:
data += "\n"
else:
data += "| \| "
return data

def parse_time(seconds):
"""Makes sense of seconds."""
minutes = 0
hours = 0
if seconds > 60:
minutes = seconds//60
seconds %= 60
if minutes > 60:
hours = minutes//60
minutes %= 60
return hours, minutes, seconds

def simulate(highest_accuracy, interval_length, lowest_accuracy, runs, number_of_single_items, number_of_double_items, total_estimate, max_srs, estimate=False):
"""Starts the whole simulation."""
# do stuff
table_data = "| %    | Reviews | \| | %    | Reviews | \| | %    | Reviews | \| | %    | Reviews | \| | %    | Reviews |\n|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-\n"
reviews_done = 0                                       # Will contain number of reviews done
accuracy = highest_accuracy                     # Rename variable
t0 = time.time()                                # Start time
while accuracy >= lowest_accuracy:
p = accuracy / 100                          # Accuracy as a probability
# Redo the simulation a number of times for the current accuracy level
total_reviews = repeat_run(runs, number_of_single_items, number_of_double_items, max_srs, p)
reviews_done += total_reviews
# Print stuff
if not estimate:
time_elapsed = time.time()-t0
seconds = round(total_estimate / reviews_done * time_elapsed - time_elapsed)
hours, minutes, seconds = parse_time(seconds)
if total_estimate is False:
progress = ""
time_left = ""
else:
progress = str(round(reviews_done / total_estimate * 100)) + '%'
time_left = str(hours) + "h " + str(minutes) + "m and " + str(seconds) + "s remaining."
text = '- ' + progress + ' - ' + time_left
print(accuracy, '-', 'Average Reviews:', total_reviews//runs, text)
# Go to new accuracy level
accuracy -= interval_length
return table_data, reviews_done

def main():
# Settings
highest_accuracy = 100                          # Percent
interval_length = 1                             # Percent
lowest_accuracy = 61                            # Percent
runs = 100                                      # Per level of accuracy
max_srs = 9                                     # Number of SRS levels
number_of_double_items = 2027 + 6300            # Number of items with two cards
number_of_single_items = 477                    # number of items with one card

# Estimate how many total reviews we can expect by doing one run
# The estimate is used to calculate time left
if runs >= 10:
table_data, reviews_done = simulate(highest_accuracy, interval_length, lowest_accuracy, 1, number_of_single_items, number_of_double_items, 0, max_srs, estimate=True)
total_estimate = reviews_done*runs
else:
total_estimate = False

# Simulate
table_data, reviews_done = simulate(highest_accuracy, interval_length, lowest_accuracy, runs, number_of_single_items, number_of_double_items, total_estimate, max_srs)
print('Total reviews:', reviews_done)
print(table_data)

main()

### Assumptions

2027 kanji,
6300 vocab words,
Kanji and vocabulary reviews each have a meaning an reading, and count as two.
There are 17131 âitemsâ to review in total.

4 apprentice levels,
2 guru levels,
1 master level,
1 enlighten level,
1 burn level.
There are 9 SRS levels in total.

A user does not fail a review item more than once per session.

### Results

These are the results after a total of 20,407,018,564 simulated reviews.

The percentage columns indicate the accuracy level. The review columns indicate the average number of reviews needed to burn everything, given the adjacent average accuracy.

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 145,906| | | 98 | 155,534| | | 97 | 166,106| | | 96 | 177,775
| 95 | 190,611| | | 94 | 204,773| | | 93 | 220,687| | | 92 | 238,860| | | 91 | 258,023
| 90 | 281,032| | | 89 | 306,655| | | 88 | 336,041| | | 87 | 369,310| | | 86 | 408,331
| 85 | 453,812| | | 84 | 505,797| | | 83 | 566,916| | | 82 | 641,099| | | 81 | 729,555
| 80 | 834,471| | | 79 | 959,757| | | 78 | 1,114,089| | | 77 | 1,302,871| | | 76 | 1,533,677
| 75 | 1,822,811| | | 74 | 2,182,566| | | 73 | 2,626,532| | | 72 | 3,193,191| | | 71 | 3,916,460
| 70 | 4,828,658| | | 69 | 5,993,856| | | 68 | 7,517,466| | | 67 | 9,468,785| | | 66 | 12,034,521
| 65 | 15,398,389| | | 64 | 19,811,452| | | 63 | 25,668,246| | | 62 | 33,436,116| | | 61 | 43,932,400

### Conclusion

My average accuracy is 95.61%, and I have done 112,663 reviews so far. Looking at the 95% cell I see that I will need to do 190,611 reviews to burn everything. As such I can calculate that I am 59.1% (112,663/190,611) of the way to burning everything.

You can find your own (current) total number of reviews, and your total accuracy, on www.wkstats.com (see highlighted parts of screenshot), and calculate how far youâve come in your journey to burn everything.

38 Likes

Two questions:

1. Did you look at meaning and reading together, so if you get one wrong you still have to review both the next time?
2. Wouldnât it be better to look at your stats for an item being marked right or wrong during a review, rather than the number of answers that are right or wrong (as the stats site shows)? That statistic would more accurately reflect how many reviews you have to do.
3 Likes

Those are good points. The stats site calculates accuracy differently than WK. For kanji/vocab, if you enter the right reading but wrong meaning, WK counts that as 0% but stats site says itâs 50%.

Itâs not so much that the stats site does it differently. The stats site shows the percentage that you see during a review session, which unfortunately was all you could get from API v1. API v2 will allow you to get the more useful percentage from the review summary page as well, and hopefully the new version of the stats site will add that information.

@Kumirei By the way, what program did you use to run the Monte Carlo simulation?

1 Like

FYI, since items start out on Apprentice 1, itâs actually only 8 reviews to Burn, not 9. Unless youâre including the Lesson quiz, but in that case, keep in mind that wkstats.com doesnât include Lesson quizzes in your accuracy.

So how can it be that Iâve done 183,871 already but my 94%+ accuracy tells me that I need 175,227 reviews?

1 Like

If Iâm correct that Kumirei is using the wrong percentages, that would be the most likely reason. But regardless, keep in mind that this is a statistical analysis. It will never be completely accurate.

1 Like

Sure, but I still have 5500+ items to burn. If I was to get everything 100% correct from now on, I would still easily need to do around 20,000 reviews. Thatâs a 10%+ difference

I know, thatâs why I said it was likely that there was something wrong with the calculation.

1 Like

A perfect track record on Wanikani would be 137048 reviews:

[(6300 vocab + 2027 kanji) * (1 reading + 1 meaning) * (8 srs level-ups)] + (477 radicals * 1 meaning * 8 srs level-ups)

1 Like

And how many reviews did you do again?

2 Likes

Just a small thing, but did you consider that getting a wrong answer at SRS level 1 and 2 will decrease the SRS level by 0 and 1 respectively instead of 2?

Iâd say the calculations themselves are basically correct. Itâs just the accuracy you see on the statistics page is wrong. Or rather itâs based on a different set of information.
The accuracy as shown on wkstats is based on how many meanings OR readings you got right.
The simulation is based on how many items (meanings AND readings at the same time) you get right.
So basically your actual accuracy is a tad below what is shown on wkstats. Depending on how often you get just one of either meaning or reading wrong it can be considerably lower than what is shown.

EDIT:
I.e. if your accuracy is 95% just subtract another 4-5% and look it up in the chart.
Assuming you did 100,000 reviews and got 95,000 correct and 5,000 wrong.
Then your accuracy would be shown as 95,000 / 100,000 = 95%
In most cases you just get either meaning or reading wrong and not both (if youâre above 90% accuracy). If you get either wrong you might as well get the other wrong as well since the item itself will be wrong anyways. In that case you can just double the amount of wrong answers you gave.
100,000 total - 95,000 correct - 5,000 wrong â 95,000/100,000 = 95% accuracy
becomes
105,000 total - 95,000 correct - 10,000 wrong â 95,000 / 105,000 = ~90.5% accuracy
Itâs based on the assumption that you only get one of them wrong. The more often you get both meaning and reading wrong at the same time the closer the wkstats accuracy is to your actual accuracy.

TL;DR: Just deduct what % youâre missing from 100% from your accuracy.

1 Like

Thatâs why itâs a simulation and not a calculation. The simulation will ideally randomise which items get answered wrong and when.

Iâm not really sure what youâre responding to. If the simulation doesnât account for when to drop items by different SRS amounts itâs simply wrong. It would be a simulation of a version of WaniKani that doesnât exist.

@shiza: Just to note, all Apprentice items drop by one SRS level instead of two (except Apprentice 1 which as you noted doesnât drop at all).

1 Like

Thatâs why I said the simulation will ideally randomise which items get answered wrong and when.

Okay, I didnât know this. I thought itâs always -2 when you get something wrong.

I always forget to check but does getting the same item wrong multiple times during the same review session decrease the % shown during reviews?

From WaniKani FAQ:

The FAQ is inaccurate then. I donât think it has been updated in years.

Iâm sure it does randomize the when, but Iâm not talking about that. Iâm talking about making sure that if the simulation randomly marks an Apprentice 4 item wrong it drops by one level to Apprentice 3, but if it marks a Master item wrong it drops by two levels to Guru 1.

1 Like

Thatâs what I meant by âwhenâ.

Apparently we have different definitions of the word âwhenâ thenâŠ

1 Like