Monte Carlo Simulation: Total number of reviews to burn everything

Not sure how this works, but couldn’t there also be an issue with the fact that your accuracy might change over time?

“When” means that something will happen whereas “if” means that that thing may not happen.

With the exception of doing all the reviews correctly (which is not useful because nobody always answers correctly), an item will always be answered incorrectly at some point in the simulation, thus “when”.

Yes, but I was talking about “what”, not “when”. As in, what happens when an item is marked wrong. Since the simulation could (and very likely would) mark some items wrong more than once (multiple "when"s), the “what” is different from the “when” and is in itself important.

2 Likes

As a fellow (hobbyist) programmer, do you mind sharing your source code?

I thought that was obvious

This is neat!:slightly_smiling_face::turtle::turtle:

1 Like

Neat! I love it! :heart_eyes::heart_eyes::heart_eyes:

Click to see some mathematical rambling

This actually inspired me to calculate the whole thing using Markov chains (ok, I stole this brilliant idea to apply some math I learned to this problem. All credits to Kumi). This is what I get:
Define a funtion rev by
rev(p)=(3*p^7 + 3*p^6 + 2*p^5 - 5*p^4 + 7*p^3 - 2*p^2 - p +1) / p^8

Then the expected value of the total number of reviews is given by
477*rev(r) + 2*2027*rev(kr*km) + 2*6300*rev(vr*vm)

where:
r: radical accuracy
kr: kanji reading accuracy
km: kanji meaning accuracy
vr: vocab reading accuracy
vm: vocab meaning accuracy

For my accuracies, I get a total number of reviews of 187755, which is still pretty close to the 175227 predicted by the Monte Carlo method for my ~94% total accuracy. My current number of reviews is 91353 and I‘ve done a bit less than half of WK, so it does work out more or less.

4 Likes

You are right, I did not do this. I was going to but forgot along the way.

I suppose it would, but the data of how many reviews we have done (counting reading and meaning as two) is more readily available for comparison. I suspect this is the reason the new numbers is lower than you would expect. Maybe I’ll run it again with the

Python (3.6) is my goto language for one-off projects, and I wrote the script in PyCharm EDU (I get a free license as a student).

Ah. I meant to include the lessons, but if that’s not what you’re showing on wkstats I’ll do the rerun starting at SRS level 1.

I did, however one assumption I forgot to mention was that the simulation assumes the user ever only gets an item wrong once when they get it wrong.


I realise now that when you get an item wrong you have to do it again in the same session. @rfindley does this count as 1 failed review or 1 fail and 1 success? If it does I have not accounted for this.


No problem. I’ll put it in the OP once I add some comments.

This matches the new figures quite well.


I ran the simulation again, starting at streak 1 as rfindley suggested, and fixed my shameful blunder of making reading and meaning independent.

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 143,730| | | 98 | 151,138| | | 97 | 159,110| | | 96 | 167,729
| 95 | 177,584| | | 94 | 188,295| | | 93 | 200,293| | | 92 | 213,886| | | 91 | 228,649
| 90 | 245,547| | | 89 | 264,488| | | 88 | 286,607| | | 87 | 311,529| | | 86 | 340,810
| 85 | 373,389| | | 84 | 412,828| | | 83 | 458,352| | | 82 | 511,623| | | 81 | 575,537
| 80 | 651,243| | | 79 | 742,974| | | 78 | 854,450| | | 77 | 989,930| | | 76 | 1,152,534
| 75 | 135,5922| | | 74 | 1,608,519| | | 73 | 1,920,193| | | 72 | 2,315,782| | | 71 | 2,808,518
| 70 | 3,435,866| | | 69 | 4,234,136| | | 68 | 5,262,556| | | 67 | 6,555,717| | | 66 | 8,291,871
| 65 | 10,509,176| | | 64 | 13,441,697| | | 63 | 17,253,387| | | 62 | 22,365,233| | | 61 | 29,046,634

6 Likes

I’m pretty sure the stats site would count that as one failed review and one successful review, since it reports on the same percentage you see during a review session. This is partly why I suggested using the other data which should be available from the reviews endpoint of API v2. That also lets you avoid the fact that people are less likely to get the same item wrong more than once in the same session (even though it can happen) since one wrong answer for either the meaning or the reading is all that matters.

1 Like

Figured.

I am not going to bother using the API to find my data, so I doubt many others would. I’m satisfied with the method I have chosen (even if I executed it poorly).


I will rerun the simulation with failed reviews counting as two reviews, though.

edit: preliminary data (just one run per accuracy level)

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137048| | | 99 | 145675| | | 98 | 155804| | | 97 | 165119| | | 96 | 177706
| 95 | 190360| | | 94 | 204276| | | 93 | 220516| | | 92 | 235989| | | 91 | 258737
| 90 | 279834| | | 89 | 304505| | | 88 | 337518| | | 87 | 367079| | | 86 | 411047
| 85 | 453213| | | 84 | 501061| | | 83 | 571540| | | 82 | 643160| | | 81 | 712097
| 80 | 819178| | | 79 | 965926| | | 78 | 1129204| | | 77 | 1281836| | | 76 | 1514940
| 75 | 1805371| | | 74 | 2228629| | | 73 | 2633358| | | 72 | 3208187| | | 71 | 3836562
| 70 | 4873396| | | 69 | 6156011| | | 68 | 7423946| | | 67 | 9429344| | | 66 | 11896626
| 65 | 15234455| | | 64 | 19755406| | | 63 | 25585207| | | 62 | 33552600| | | 61 | 44312988

edit2: 20,407,018,564 reviews later… results

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 145,906| | | 98 | 155,534| | | 97 | 166,106| | | 96 | 177,775
| 95 | 190,611| | | 94 | 204,773| | | 93 | 220,687| | | 92 | 238,860| | | 91 | 258,023
| 90 | 281,032| | | 89 | 306,655| | | 88 | 336,041| | | 87 | 369,310| | | 86 | 408,331
| 85 | 453,812| | | 84 | 505,797| | | 83 | 566,916| | | 82 | 641,099| | | 81 | 729,555
| 80 | 834,471| | | 79 | 959,757| | | 78 | 1,114,089| | | 77 | 1,302,871| | | 76 | 1,533,677
| 75 | 1,822,811| | | 74 | 2,182,566| | | 73 | 2,626,532| | | 72 | 3,193,191| | | 71 | 3,916,460
| 70 | 4,828,658| | | 69 | 5,993,856| | | 68 | 7,517,466| | | 67 | 9,468,785| | | 66 | 12,034,521
| 65 | 15,398,389| | | 64 | 19,811,452| | | 63 | 25,668,246| | | 62 | 33,436,116| | | 61 | 43,932,400

2 Likes

Out of simplicity, on wkstats I simply count the # correct and #incorrect from the /review_statistics endpoints.

During a review session, if you answer 1 incorrect reading and 2 incorrect meanings on an item, I think the /review_statics endpoint counts that as 1 correct meaning, 1 correct reading, 1 incorrect reading and 2 incorrect meanings.

1 Like

Ah, ok, so the review you get after failing a reading or meaning doesn’t count at all?

He’s saying it does count.

1 Like

Oh, haha, sorry, I was reading incorrect where it said correct.

In that case the most recent simulation is the most accurate one. I don’t account for getting reading or meaning incorrect multiple times, but I think it’s close enough.

In case you were interested, here’s the code that I think will give you the percentage correct for items. WaniKani only started recording this data last year I think, so it won’t have all of your information. For comparison, the stats site says my overall accuracy is 94%, whereas this code says my accuracy is 86%.

wkof.include('ItemData, Apiv2');
wkof.ready('ItemData, Apiv2').then(fetch_items);

function fetch_items() {
	wkof.Apiv2.fetch_endpoint('reviews').then(function(results) {
		var totalCorrect = 0;

		var reviewData = results.data;
		var totalReviews = reviewData.length;
		for (var i = 0; i < reviewData.length; i++) {
			var itemData = reviewData[i].data;
			if (itemData.incorrect_meaning_answers === 0 && itemData.incorrect_reading_answers === 0) {
				totalCorrect++;
			}
		}

		var percentage = Math.floor((totalCorrect / totalReviews) * 10000) / 100;
		alert(`Correct: ${totalCorrect}, Percentage: ${percentage}%`);
	});
}

If you or anyone has a suggestion for how and where to present this data, I’d consider making it a script. I’d have to add some caching and I’d want to capture some smaller windows other than “from the beginning of time” (such as “this month” or “last 7 days”).

3 Likes

According to this my percentage is 85.52%.
Maybe I’ll do a simulation for this percentage too, tomorrow.


@anon38003452 adding the code to the OP now. It’s not anything special, but it works.

Thanks!

So I did a simulation using these accuracy percentages instead. I did it down to 61% again, but since these are lower than the percentage shown on the stats site I’ll do another simulation down to 41%

edit: updated with new data

Total reviews: 14,904,596,832
| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 141,746| | | 98 | 146,785| | | 97 | 151,944| | | 96 | 157,482
| 95 | 163,247| | | 94 | 169,326| | | 93 | 175,755| | | 92 | 182,624| | | 91 | 189,799
| 90 | 197,558| | | 89 | 205,461| | | 88 | 214,368| | | 87 | 223,713| | | 86 | 233,538
| 85 | 243,860| | | 84 | 255,474| | | 83 | 267,438| | | 82 | 280,210| | | 81 | 294,305
| 80 | 309,674| | | 79 | 326,016| | | 78 | 343,702| | | 77 | 363,506| | | 76 | 385,039
| 75 | 407,785| | | 74 | 433,147| | | 73 | 460,428| | | 72 | 491,826| | | 71 | 525,655
| 70 | 562,363| | | 69 | 604,323| | | 68 | 650,857| | | 67 | 702,903| | | 66 | 761,557
| 65 | 826,899| | | 64 | 902,990| | | 63 | 985,454| | | 62 | 1,081,859| | | 61 | 1,192,915
| 60 | 1,319,041| | | 59 | 1,463,020| | | 58 | 1,630,708| | | 57 | 1,826,892| | | 56 | 2,052,529
| 55 | 2,317,358| | | 54 | 2,629,361| | | 53 | 2,998,192| | | 52 | 3,432,951| | | 51 | 3,953,713
| 50 | 4,572,965| | | 49 | 5,335,770| | | 48 | 6,218,750| | | 47 | 7,309,200| | | 46 | 8,659,811
| 45 | 10,263,890| | | 44 | 12,282,422| | | 43 | 14,810,092| | | 42 | 17,863,330| | | 41 | 21,755,393

2 Likes

This is so interesting, thank you for sharing :heart_eyes:

I’d love to see something that took into account potential ‘accuracy decay’ as you progress through WK, but I’m guessing that would add a whole layer of complexity to the business.

2 Likes

I think that depending on how you model it, it could be easy. You could just assign a modifier to each item when you create them, and multiply the probability of passing a review with that, at each evaluation.

How do you think that the graph of accuracy over time would look?

1 Like