First time poster here working in data analytics. This might be more of a question for the WaniKani dev team but I would be very interested to see which kanji and vocab terms have the highest percent of incorrect answers in the apprentice/guru phases. If they even collect data this granular, potentially there could be a community thread of mnemonics and tips for these most troublesome items.
Which makes me think there will not be much âaccumulatedâ data really. As the WK-team has a habit of making updates weekly for things being suggested, and improvements theyâve been working on for long-term keep-up for the site. But there are items being block listed every week for example, so yeah, that changes the statistics you can gain long-term of which items are failed and why! (before or after weekly changes).
Youâd have to post the questions directly to the Tofugu team likely and if they have some other means of measuring impact of their changes on the site.
Most likely the statistics would show lower level items as the most incorrect just because fewer people get to the higher levels and not because theyâre more difficult.
yes, but due to the Wanikani team moving items up and down levels, from quite a huge difference in height at times, thereâs no real way of measuring this is there. Just week for week before a specific item has gotten a weekly item update and moved between levels.
So, data might be accumulated âper itemâ?? But, how to measure the whole siteâs items? and various update impacts?
Iâm sure there are way to exclude interference in the data, perhaps, but Iâm no statistics analysis person, so I wouldnât know first thing about this.
I assume one would look at âpercentage failedâ and not at âabsolute number failedâ, no?
Or are you saying that you think people generally fail more items in lower levels because they are new to kanji learning and stuff? I donât expect this to be the case, but it would be an interesting result nonetheless!
Percentage failed favors items with fewer data points, so the information is less useful. If thereâs an item that only 5 people have reviewed and they all got it wrong, it would be considered worse than an item that 1000 people reviewed and 900 got wrong. We have to determine what we consider to be most incorrect accounting for both the number of people who reviewed it as well as percentage of failure. Statistics is hard.
What I meant is that youâll have a larger absolute number of incorrect items in the lower levels because there are more people there to be incorrect.
Oh you mean the information is less useful because the population is smaller? Interesting.
Well, to be honest I would also consider the item where 5 of 5 people got it wrong to be worse than the other. But thatâs of course my interpretation, then. I see your point now, thanks for clarifying!
Itâs just that with 5 people you donât know if you just have the wrong 5 people. Maybe a different set of 5 would have gotten them all right. A larger data set reduces biases.
If people could add their own cards, youâd have that tail-end effect where there are things only one or two people have ever reviewed and they kept missing it, so the percent failed is higher than any of the common items.
But with the curriculum in WK being a fixed corpus, and enough people having gone all the way through it to 60, there should be a large enough sample of all the items to do statistically significant analysis with. The worst case of what you are describing would be for words that were just added, and at a high enough level not many people have seen them yet, but there are ways to deal with that if it pollutes the results (like âshow me the top five items by fail percentage out of those that have been in the WK corpus over over a yearâ).
One the one hand, my percentage correct was much better in the single digit levels, a bit worse in the teens, and then I died a few months ago on my way to Hell. On the other hand Iâve gotten to the point where I can guess readings from radicals for those that arenât rendokuâd or arenât äșș. So itâs a wash? Uh, no itâs still Death.
To which Iâd assert that words containing äșș must win hands down in the lowest percentage correct category. I mean really, if the monks back in the day had an ISO standardization committee this all would be much easier.