[Userscript] The GanbarOmeter

Rrwrex · December 29, 2021, 9:48pm

The underlying gauge component already takes a number between 0 and 1 to determine how far to rotate the dial. It’s easy enough to apply whatever algorithm I want to as long as I continue to feed the underlying component a number between 0 and 1.

Current behavior is as follows: the default settings have a target of 100 apprentice items. Ignoring “new” RKV items, 0 items displays 0%, 100 items displays the gauge at its halfway point (50%), and 200 (or more) items pegs the scale at 100%.

I’m leaning toward keeping it linear. At it’s simplest, it’s a “how many items are in your apprentice queue” meter. Weighting new radicals/kanji/vocabulary is hard enough to understand and reason about without throwing in logarithmic calculations.

Maybe a statistical approach? I could assume a normal distribution and use percentiles. That is, values within 34% of the target number (one standard deviation) are considered “normal”, more than 34% away are considered low or high.

I think value-dependent color is easy enough to introduce. I hadn’t thought of eliminating a numeric display altogether and only showing words, though. That’s an interesting thought.

Instead of a numeric value, I could display random vocabulary depending on the range. In addition to changing the color as discussed above, I’m thinking maybe something like the following.

Assuming a target of 100 apprentice items:

0 to 65 items (low): マジ？頑張れ退屈
66 to 134 items (normal): 平気普通並
135 or more items (high): 超圧危機疲れ

Thoughts?

LupoMikti · December 29, 2021, 10:00pm

I guess the question for that approach becomes whether or not assuming normality for the distribution is the correct call. Maybe the difficulty looks more like a chi-squared distribution for example? (Actually, if you want to go with this approach, I think this distribution might be the most fitting, as it can be used as a test for standard deviation and goodness-of-fit. And in this case, the degrees of freedom would be the desired number of apprentice items, which if large enough would approximate a normal distribution anyway).

In any case I like the idea of going with a statistical distribution and percentile ranges if having a log-based approach for the display isn’t to your liking (although I’m still a little unsure of what you don’t like about it; to me it’s not changing the calculations fundamentally, it’s just a transformation for display purposes. But because of that it’s not necessary or groundbreaking either, so it’s fine to stick with linear).

wct · December 29, 2021, 10:53pm

Yeah, as I was re-reading my suggestion, I was thinking, “Does it really have to be logit/non-linear?” And, honestly, I think that was mainly me being hyper-focused on ‘the ideal system’ when just a ‘good enough’ system would be totally fine. Can always ‘upgrade’ to non-linear later if the need arises.

As a suggestion to keep it linear, you could literally just scale it between +/-MaxValue, so that 0% goes to -MaxValue, 50% goes to 0, and 100% goes to +MaxValue. Just choose MaxValue to be whatever you want. Given a p input between 0 and 1 (divide by 100 if actually using percentages):

z = f(p) = (p - 0.5) * 2 * MaxValue
or, equivalently,
z = f(p) = (2*p - 1) * MaxValue

Rrwrex · December 30, 2021, 12:08am

I need to quit lily-guilding and get this thing finished, but this is the most important gauge, so I want to get it right.

Really, neither a statistical distribution of any shape nor a logarithmic approach feels right to me because the number of items in apprentice stages is directly under a user’s control (mostly).

Thinking out loud:

Fundamentals

The gauge has one intent: it informs you whether to slow down or speed up doing lessons (and ideally, to what degree).
It does this by analyzing upcoming assignments. It looks for subject assignments in the first four SRS stages (the “apprentice queue”) and displays a gauge based on the counts, types, and SRS stages of those items.
The number of items in your apprentice queue is primarily affected by how many lessons you’ve performed recently. Lessons are directly under a user’s control and normally done in chunks of five (it’s not at all a random distribution).
The apprentice queue is also affected by misses: items taking longer to guru, or previously guru’d items falling back down to lower levels. This is somewhat random, but there’s potentially a positive feedback loop: too many apprentice items makes reviews harder which causes more misses preventing the apprentice queue from shrinking.
I want to analyze (and “weight”) the first half of the Apprentice queue specially for multiple reasons:
- Normally, the first two stages are almost entirely from recent lessons. The latter two apprentice stages have a mix of recent lessons and older items.
- The first two stages also have VERY short progression schedules (4 hours and 8 hours).
- Because of the above, these items are normally harder. They’re new and not yet memorized: it’s normal to need several reviews to get them to stick.
Everyone is different. Some will be more comfortable with a wider range of apprentice items, others want to keep it very close to the target. Good defaults that work for the majority of users are important, though.
- Logit or +/- 34% are just heuristic ways saying a range of “middle-ish” values are acceptable, but I doubt different users will agree to the same range.
- It reminds me of tuning the “Q” (quality factor) of a resonant circuit: Some people will want a tall, narrow shape with rapid falloff. Others will want a short, squat shape with gradual falloff.

Takeaways

The display should definitely NOT be a percentage. It made sense to me because I wrote the dang thing, but I can see how it confused people.
Zero-center makes sense, but neither positive/negative nor easy/hard designations for either side of zero make any sense based on fundamental #1. It should tell you whether to do more or fewer lessons, whether to speed up or to slow down.
Users need to specify an acceptable range of weighted values, not just a single target. I should have the users specify a minimum target and a maximum target? Below the minimum means “do more lessons”, above the maximum means “do fewer”, and in between means “you’re doing fine”.
- For me personally, between 80 and 120 items feels about right. That means the needle should be straight up at 100 items. The question is how far to move the needle at 80 or 100 items. I think a fairly wide range of movement makes sense, say 20% of the range for values between 80 and 100 items (80 items moving the needle 40% of the range, and 120 items 60%). Not coincidentally, this is exactly how it behaves currently, but I currently only let you specify the middle value instead of the range.
- Currently, with just a single target, I just double the target value to figure out what the needle at 100% means. Allowing the user to specify a range instead of a single target lets them indirectly specify their preferred “Q” value: the needle will always move from 40% to 60% for weighted values between the min and max targets.
- If the weighted count is between the two values, I display 良 / “OK” / “satisfactory” or somesuch and leave the display a normal green.
- The further away it gets outside of the two values to redder the color gets (linearly as @wct suggests in his last message). If it’s below the minimum value it displays 遅 / “speed up” / “do more,” or whatever. If it’s above the max, it displays 速 / “slow down” / “do fewer” or something.

Thanks to both of you for your feedback. This was helpful!

rikvg · December 30, 2021, 10:28pm

I love the ganbarometer! It looks cool on my dashboard, I love its name and I love your attitude towards it!

It hasn’t changed my behaviour though. I use the following formula:
I try to keep (apprentice + (guru/10)) under 150.

I saw this somewhere on the boards and stuck with it, and since then my review count has been quite feasible, and my level speed has actually increased. I am afraid I can’t for the life of me remember which old timer recommended the formula, but it works very well for me.

I have had quite a bit of trouble with leeches, and at times my number of guru’d items has risen to over 900. Apprentice at around 100 was just too much then.
Right now my guru items are around 650, and I try to keep my apprentice items at around 80-85 most of the time. This keeps me in control, and keeps a steady pace without drowning in leeches.

I wonder if you would consider including the number of guru items in your gambarometer, for people like me with less than average memories (at least in the sphere of learners of Japanese).

Either way, I love your work and I will keep the gambarometer installed wether you include a factor for guru items or not.

Rrwrex · December 30, 2021, 10:50pm

Awesome to hear! Thanks.

Absolutely! That’s interesting, I’ve not seen that heuristic before but it seems reasonable and pretty close to what I’ve landed on in practice (I’ve currently got 83 apprentice and 513 guru items in my queue and just recently leveled up).

There’s a bit of synchronicity, too. I was actually refactoring the part of the code that deals with this logic when I saw your post.

I’m in the home stretch for improvements before releasing. The biggest remaining work is with the caching logic (always tricky). I really want to get it done before releasing, though.

Thanks for the kind words!

Rrwrex · December 31, 2021, 2:25am

Okay, here is how the Ganbarometer display value is calculated in the dev version:


target_min = 135;                 // user setting, default = 135
target_max = 165;                 // user setting, default = 165
target_count = 
  (target_min + target_max) / 2;  // default => 150

weighted_count = 
    new_radicals * new_r_weight   // default new_r_weight = 0.75
  + new_kanji * new_k_weight      // default new_k_weight = 3.0
  + new_vocab * new_v_weight      // default new_v_weight = 1.0
  + late_apprentice * late_weight // default late_weight = 1.0
  + guru * guru_weight            // default guru_weight = 0.1
  + master * master_weight        // default master_weight = 0
  + enlightened * enlight_weight; // default enlight_weight = 0

display_value =                   // A value between 0 and 1
  weighted_count / (2 * target_count);

Basically, I’ve split the “apprentice” bucket into two parts: early (stages 1 and 2) and late (stages 3 and 4). I’ve further broken out the items in the early-apprentice bucket by type (radical, kanji, or vocabulary).

The idea is that items in the first two stages are almost guaranteed to be brand new, and will ordinarily be the most difficult to recall (you haven’t really learned them yet). But different users find some items more difficult than others.

I find new Kanji harder to memorize than new vocabulary because vocabulary is only introduced after you’ve guru’d the underlying kanji. The vocabulary items just reinforce kanji you’ve already learned (potentially introducing new readings and nuances of meaning). Radicals, on the other hand, are easiest of all because they only have a meaning (no reading).

I’ve also introduced support for @rikvg 's excellent suggestion. If there are no kanji or radicals in stages 1 or 2, the calculation devolves to apprentice + guru/10.

For completeness, I’ve also considered items in the remaining categories (master/stage7 and englightened/stage8). By default the weight for those items is 0, so they are ignored.

I REALLY want the default settings to be useful for all users, especially newbies. But I’ve designed it primarily for my own use.

I think the default target of 150 and weighted_apprentice + guru/10 approach should work for almost everyone “out of the box”.

Interestingly, the new calculation allows enough flexibility to look at the entire assignment queue. I can easily imagine someone wanting to slow down if they have more than, say, 3500 total items under review at any one time (waiting until they burn more items to do more lessons).

Finally, I’m still pondering how to incorporate @Kumirei 's awesome little “Expected Number of Daily Reviews” script. I’ve decided NOT to use the value in the Ganbarometer display, but I do want to show it in the “Accuracy” bar chart (which shows the real number of reviews/day as well as the target).

My current thought is to just display the “expected daily” number in the data/table view (likely emphasized if the value is too far away from the real average number of daily reviews).

Now to find a wholesale supplier of cheap gold leaf…

Rrwrex · December 31, 2021, 7:14pm

Ooh. Sorry to reply to myself, but I think I figured out a good way to display this graphically, too.

Currently, the accuracy bar chart shows a bar with the actual number of reviews performed each day (with a darker portion showing the accuracy). It also displays a single dotted line with the target number of reviews per day.

The idea: Replace the dotted line with a low-opacity but solidly colored box showing the target range (high and low) then add a dotted line for the calculated “expected daily” calculation.

This packs a LOT of information into that little bar chart!

How many reviews were performed each day?
What percentage of those reviews were answered correctly on the first try?
How do the number of reviews compare to the targets?
If I continue with the same rough accuracy and distribution across the stages, how many reviews can I expect to do each day in the future?
Is that “expected” number very far off (high or low) from my target range?

~~I’ll update this post with a screenshot after I’ve cobbled it together, but it looks great (and easy to understand) in my mind’s eye. <laugh>~~

Here’s what I meant:

The light green box is the target range, currently set to 120-180, the orange dotted line is the expected daily number of reviews (currently fixed at 150, I need to calculate this now ).

Kumirei · December 31, 2021, 10:25pm

How much time have you put into this script

whinette · December 31, 2021, 10:51pm

I’m not sure about the Difficulty weighting.
I have mostly guru 2, master and enlighten items left, which are (at least for me?) far more difficult than apprentices and guru 1.
The jauge display 8% though, even though that these items have a bad leeches ratio.

Nice script tbh.

Rrwrex · December 31, 2021, 11:20pm

All of it!

Rrwrex · December 31, 2021, 11:27pm

I’m making major, wholesale changes to the the script. The new version will have settings to let you emphasize whatever makes the most sense for where you are on the journey.

Most users are in earlier stages where the defaults (basically apprentice_items + guru_items/10 < 150) should make sense.

Since the primary objective of that gauge is to help you decide whether to slow down or speed up doing lessons, the defaults won’t make sense for level-60 users like yourself (no more lessons!).

HOWEVER, the new settings will allow you to weight items in master and enlightened. This may still make it useful for you.

The new version will be great if I can ever finish it!

Rrwrex · January 2, 2022, 12:21am

Other than caching reviews and getting the color/theme stuff wired in, everything is currently working.

More importantly: I’m out of crazy ideas on how to improve this thing!

Graph View

The ganbarometer gauge is now a zero-center needle with three ranges. A numeric value is calculated based on the stages and types of items in the assignment queue. That value is compared to lower and upper limits specified by the user (default 130-170). Values between the limits display a “good” label. Below the lower limit displays a “more effort needed” label. Above the upper limit displays a “take a break” label. (I couldn’t decide the best kanji text for the labels so I made it a user setting.)

Here is the settings screen for that widget:

The Reviews widget (was labeled “Accuracy” but I’ve gone back to “Reviews”) shows the number of reviews each day relative to an upper and lower target (the light green box in the background). It also shows a dashed, gold horizontal line for the “daily expected reviews”. That value is calculated the same way as @Kumirei 's script. Finally, the accuracy for each day is shown in darker green. Hovering over any bars displays the numeric value and accuracy percentage.

Data view

Poll

Looks fantastic! Hurry up and ship it already!
Please add/correct <whatever> (specify in a comment)
You know, I think I really like vanilla. (Apologies to G. Larson)

0 voters

Beta-0.1.0

I just pushed a new beta version. I still haven’t fixed the theming/color-settings as well as a few other things. And it still isn’t caching reviews so it can take several seconds if you haven’t pulled reviews in a while. But it should be in good enough shape to try it out if you’re interested.

Here are the installation instructions for the beta.

Kumirei · January 2, 2022, 4:22am

I’m sure you’ll think of something

LupoMikti · January 2, 2022, 5:40am

It seems like for this latest beta version, the script isn’t properly getting data for the ganbarometer portion. Unless I’m missing something I believe the numbers for the “Guru’d” part should match the image below it, correct?

(Yes, I’m in a very weird spot where it feels like I’m starting at a zero point but hey, those numbers will rise soon enough.)

Rrwrex · January 2, 2022, 6:22am

Hmm. All zeros is weird. The “Apprentice” line should total 20, guru 13, master 23, and enlightened 454.

Is the spinner up top (the red snake like thing) still going, or did it go away?

It may be because you still have localstorage data from a previous version of the script that’s incompatible.

All zeros indicates it hasn’t retrieved any assignment data from WKOF.

Things to try:

Open the settings, click the default button, click save, then click the red X. Then refresh the page. Then wait for the WKOF “loading data” (and my red spinner) to disappear indicating everything has loaded.
You may also need to move the days to review slider to a different value (the format changed recently).
If that doesn’t work, right click anywhere on the dashboard page and select “inspect”. Then find the localstorage (in Chrome it’s under the Application tab, then localstorage on the left). If you twirl open localstorage, you should see an entry for https://www.wanikani.com. Select that and you should see items for “reviewCounts”, “srsCounts”, “sessionSummaries”, “gbSettings”, and “daysToReview”. Right click on each of those in turn and click Delete (it’s just cached data that will be rebuilt but sometimes format changes cause things to get weird).

Please let me know if any of this works. If it’s due to a change in the data format as I think, it should start working correctly once it’s cleared. The production version will have a version key and delete all of these the first time it’s loaded so none of this will be necessary.

Right

LupoMikti · January 2, 2022, 6:30am

Tried all 3 things, and still no luck.

Rrwrex · January 2, 2022, 6:31am

Did the spinner go away?

LupoMikti · January 2, 2022, 6:32am

Yes. Sorry, forgot to mention that it wasn’t stuck in the first place; but to be clear, the spinner goes away almost immediately.

Rrwrex · January 2, 2022, 6:34am

Sorry to make you debug, but could you go into the inspector and see what’s in the srsCounts object?

Should look like this in Chrome (but with your values, of course):

Also, forgot to ask: but are any errors logged in the console when you refresh?