Monte Carlo Simulation: Total number of reviews to burn everything

stygianmist310 · July 15, 2018, 12:54am

Not sure how this works, but couldn’t there also be an issue with the fact that your accuracy might change over time?

plantron · July 15, 2018, 12:55am

“When” means that something will happen whereas “if” means that that thing may not happen.

With the exception of doing all the reviews correctly (which is not useful because nobody always answers correctly), an item will always be answered incorrectly at some point in the simulation, thus “when”.

seanblue · July 15, 2018, 1:00am

Yes, but I was talking about “what”, not “when”. As in, what happens when an item is marked wrong. Since the simulation could (and very likely would) mark some items wrong more than once (multiple "when"s), the “what” is different from the “when” and is in itself important.

anon38003452 · July 15, 2018, 8:51am

As a fellow (hobbyist) programmer, do you mind sharing your source code?

plantron · July 15, 2018, 9:20am

I thought that was obvious

TamanegiNoKame · July 15, 2018, 9:40am

This is neat!

Tyger · July 15, 2018, 10:03am

Neat! I love it!

Click to see some mathematical rambling

This actually inspired me to calculate the whole thing using Markov chains (ok, I stole this brilliant idea to apply some math I learned to this problem. All credits to Kumi). This is what I get:
Define a funtion rev by
rev(p)=(3*p^7 + 3*p^6 + 2*p^5 - 5*p^4 + 7*p^3 - 2*p^2 - p +1) / p^8

Then the expected value of the total number of reviews is given by
477*rev(r) + 2*2027*rev(kr*km) + 2*6300*rev(vr*vm)

where:
r: radical accuracy
kr: kanji reading accuracy
km: kanji meaning accuracy
vr: vocab reading accuracy
vm: vocab meaning accuracy

For my accuracies, I get a total number of reviews of 187755, which is still pretty close to the 175227 predicted by the Monte Carlo method for my ~94% total accuracy. My current number of reviews is 91353 and I‘ve done a bit less than half of WK, so it does work out more or less.

Kumirei · July 15, 2018, 2:00pm

You are right, I did not do this. I was going to but forgot along the way.

I suppose it would, but the data of how many reviews we have done (counting reading and meaning as two) is more readily available for comparison. I suspect this is the reason the new numbers is lower than you would expect. Maybe I’ll run it again with the

Python (3.6) is my goto language for one-off projects, and I wrote the script in PyCharm EDU (I get a free license as a student).

Ah. I meant to include the lessons, but if that’s not what you’re showing on wkstats I’ll do the rerun starting at SRS level 1.

I did, however one assumption I forgot to mention was that the simulation assumes the user ever only gets an item wrong once when they get it wrong.

I realise now that when you get an item wrong you have to do it again in the same session. @rfindley does this count as 1 failed review or 1 fail and 1 success? If it does I have not accounted for this.

No problem. I’ll put it in the OP once I add some comments.

This matches the new figures quite well.

I ran the simulation again, starting at streak 1 as rfindley suggested, and fixed my shameful blunder of making reading and meaning independent.

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 143,730| | | 98 | 151,138| | | 97 | 159,110| | | 96 | 167,729
| 95 | 177,584| | | 94 | 188,295| | | 93 | 200,293| | | 92 | 213,886| | | 91 | 228,649
| 90 | 245,547| | | 89 | 264,488| | | 88 | 286,607| | | 87 | 311,529| | | 86 | 340,810
| 85 | 373,389| | | 84 | 412,828| | | 83 | 458,352| | | 82 | 511,623| | | 81 | 575,537
| 80 | 651,243| | | 79 | 742,974| | | 78 | 854,450| | | 77 | 989,930| | | 76 | 1,152,534
| 75 | 135,5922| | | 74 | 1,608,519| | | 73 | 1,920,193| | | 72 | 2,315,782| | | 71 | 2,808,518
| 70 | 3,435,866| | | 69 | 4,234,136| | | 68 | 5,262,556| | | 67 | 6,555,717| | | 66 | 8,291,871
| 65 | 10,509,176| | | 64 | 13,441,697| | | 63 | 17,253,387| | | 62 | 22,365,233| | | 61 | 29,046,634

seanblue · July 15, 2018, 3:00pm

I’m pretty sure the stats site would count that as one failed review and one successful review, since it reports on the same percentage you see during a review session. This is partly why I suggested using the other data which should be available from the reviews endpoint of API v2. That also lets you avoid the fact that people are less likely to get the same item wrong more than once in the same session (even though it can happen) since one wrong answer for either the meaning or the reading is all that matters.

Kumirei · July 15, 2018, 3:11pm

Figured.

I am not going to bother using the API to find my data, so I doubt many others would. I’m satisfied with the method I have chosen (even if I executed it poorly).

I will rerun the simulation with failed reviews counting as two reviews, though.

edit: preliminary data (just one run per accuracy level)

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137048| | | 99 | 145675| | | 98 | 155804| | | 97 | 165119| | | 96 | 177706
| 95 | 190360| | | 94 | 204276| | | 93 | 220516| | | 92 | 235989| | | 91 | 258737
| 90 | 279834| | | 89 | 304505| | | 88 | 337518| | | 87 | 367079| | | 86 | 411047
| 85 | 453213| | | 84 | 501061| | | 83 | 571540| | | 82 | 643160| | | 81 | 712097
| 80 | 819178| | | 79 | 965926| | | 78 | 1129204| | | 77 | 1281836| | | 76 | 1514940
| 75 | 1805371| | | 74 | 2228629| | | 73 | 2633358| | | 72 | 3208187| | | 71 | 3836562
| 70 | 4873396| | | 69 | 6156011| | | 68 | 7423946| | | 67 | 9429344| | | 66 | 11896626
| 65 | 15234455| | | 64 | 19755406| | | 63 | 25585207| | | 62 | 33552600| | | 61 | 44312988

edit2: 20,407,018,564 reviews later… results

| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 145,906| | | 98 | 155,534| | | 97 | 166,106| | | 96 | 177,775
| 95 | 190,611| | | 94 | 204,773| | | 93 | 220,687| | | 92 | 238,860| | | 91 | 258,023
| 90 | 281,032| | | 89 | 306,655| | | 88 | 336,041| | | 87 | 369,310| | | 86 | 408,331
| 85 | 453,812| | | 84 | 505,797| | | 83 | 566,916| | | 82 | 641,099| | | 81 | 729,555
| 80 | 834,471| | | 79 | 959,757| | | 78 | 1,114,089| | | 77 | 1,302,871| | | 76 | 1,533,677
| 75 | 1,822,811| | | 74 | 2,182,566| | | 73 | 2,626,532| | | 72 | 3,193,191| | | 71 | 3,916,460
| 70 | 4,828,658| | | 69 | 5,993,856| | | 68 | 7,517,466| | | 67 | 9,468,785| | | 66 | 12,034,521
| 65 | 15,398,389| | | 64 | 19,811,452| | | 63 | 25,668,246| | | 62 | 33,436,116| | | 61 | 43,932,400

rfindley · July 15, 2018, 5:51pm

Out of simplicity, on wkstats I simply count the # correct and #incorrect from the /review_statistics endpoints.

During a review session, if you answer 1 incorrect reading and 2 incorrect meanings on an item, I think the /review_statics endpoint counts that as 1 correct meaning, 1 correct reading, 1 incorrect reading and 2 incorrect meanings.

Kumirei · July 15, 2018, 5:54pm

Ah, ok, so the review you get after failing a reading or meaning doesn’t count at all?

seanblue · July 15, 2018, 5:56pm

He’s saying it does count.

Kumirei · July 15, 2018, 5:57pm

Oh, haha, sorry, I was reading incorrect where it said correct.

In that case the most recent simulation is the most accurate one. I don’t account for getting reading or meaning incorrect multiple times, but I think it’s close enough.

seanblue · July 15, 2018, 6:21pm

In case you were interested, here’s the code that I think will give you the percentage correct for items. WaniKani only started recording this data last year I think, so it won’t have all of your information. For comparison, the stats site says my overall accuracy is 94%, whereas this code says my accuracy is 86%.

wkof.include('ItemData, Apiv2');
wkof.ready('ItemData, Apiv2').then(fetch_items);

function fetch_items() {
	wkof.Apiv2.fetch_endpoint('reviews').then(function(results) {
		var totalCorrect = 0;

		var reviewData = results.data;
		var totalReviews = reviewData.length;
		for (var i = 0; i < reviewData.length; i++) {
			var itemData = reviewData[i].data;
			if (itemData.incorrect_meaning_answers === 0 && itemData.incorrect_reading_answers === 0) {
				totalCorrect++;
			}
		}

		var percentage = Math.floor((totalCorrect / totalReviews) * 10000) / 100;
		alert(`Correct: ${totalCorrect}, Percentage: ${percentage}%`);
	});
}

If you or anyone has a suggestion for how and where to present this data, I’d consider making it a script. I’d have to add some caching and I’d want to capture some smaller windows other than “from the beginning of time” (such as “this month” or “last 7 days”).

Kumirei · July 15, 2018, 8:23pm

According to this my percentage is 85.52%.
Maybe I’ll do a simulation for this percentage too, tomorrow.

@anon38003452 adding the code to the OP now. It’s not anything special, but it works.

anon38003452 · July 16, 2018, 9:51am

Thanks!

Kumirei · July 16, 2018, 1:23pm

So I did a simulation using these accuracy percentages instead. I did it down to 61% again, but since these are lower than the percentage shown on the stats site I’ll do another simulation down to 41%

edit: updated with new data

Total reviews: 14,904,596,832
| % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews | | | % | Reviews |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| 100 | 137,048| | | 99 | 141,746| | | 98 | 146,785| | | 97 | 151,944| | | 96 | 157,482
| 95 | 163,247| | | 94 | 169,326| | | 93 | 175,755| | | 92 | 182,624| | | 91 | 189,799
| 90 | 197,558| | | 89 | 205,461| | | 88 | 214,368| | | 87 | 223,713| | | 86 | 233,538
| 85 | 243,860| | | 84 | 255,474| | | 83 | 267,438| | | 82 | 280,210| | | 81 | 294,305
| 80 | 309,674| | | 79 | 326,016| | | 78 | 343,702| | | 77 | 363,506| | | 76 | 385,039
| 75 | 407,785| | | 74 | 433,147| | | 73 | 460,428| | | 72 | 491,826| | | 71 | 525,655
| 70 | 562,363| | | 69 | 604,323| | | 68 | 650,857| | | 67 | 702,903| | | 66 | 761,557
| 65 | 826,899| | | 64 | 902,990| | | 63 | 985,454| | | 62 | 1,081,859| | | 61 | 1,192,915
| 60 | 1,319,041| | | 59 | 1,463,020| | | 58 | 1,630,708| | | 57 | 1,826,892| | | 56 | 2,052,529
| 55 | 2,317,358| | | 54 | 2,629,361| | | 53 | 2,998,192| | | 52 | 3,432,951| | | 51 | 3,953,713
| 50 | 4,572,965| | | 49 | 5,335,770| | | 48 | 6,218,750| | | 47 | 7,309,200| | | 46 | 8,659,811
| 45 | 10,263,890| | | 44 | 12,282,422| | | 43 | 14,810,092| | | 42 | 17,863,330| | | 41 | 21,755,393

Radish8 · July 16, 2018, 1:26pm

This is so interesting, thank you for sharing

I’d love to see something that took into account potential ‘accuracy decay’ as you progress through WK, but I’m guessing that would add a whole layer of complexity to the business.

Kumirei · July 16, 2018, 1:31pm

I think that depending on how you model it, it could be easy. You could just assign a modifier to each item when you create them, and multiply the probability of passing a review with that, at each evaluation.

How do you think that the graph of accuracy over time would look?

Topic		Replies	Views
Total reviews all-time API And Third-Party Apps	37	2383	November 29, 2016
This is going to take a while WaniKani	29	2120	March 3, 2021
Choosing the right pace for your lessons (thanks to queueing theory) Tips & Tricks	51	4563	September 7, 2019
Estimate reviews per day Requesting Help	31	5877	February 12, 2019
Scaling the 6583 review wall - Once more unto the breach! WaniKani	362	15192	January 20, 2023

Monte Carlo Simulation: Total number of reviews to burn everything

Related topics