[Userscript] The GanbarOmeter

Yes, the problem is that localStorage does not offer any way to do a combined read and write as an atomic operation. I agree that IndexedDB transactions are probably the best way to do this. I was just confused by your wording that localStorage.setItem() would not be atomic.

1 Like

Ah! I’d actually started to write that as get/setItem originally but it seemed overly verbose. :slight_smile:

Good point. I didn’t think of this.

I forgot to write about an important coming change to the GanbarOmeter gauge itself:

The weighting calculation for “new” R/K/V items will (hopefully) be much more rational and understandable/intuitive.

In the settings form you specify the following (default values shown):

Desired number of Apprentice Items: 100
Extra weighting for 'new' items (items in stages 1 and 2):
         Kanji: 3.0
      Radicals: 0.75
    Vocabulary: 1.25

The default values mean that kanji in stages 1 and 2 are three times “heavier”/harder than normal apprentice items, radicals are slightly “lighter”/easier, and new vocabulary is only slightly harder. You could set any of the three weighting values to 1.0 if you wanted to treat those items normally.

The GanbarOmeter still displays a value between 0 and 100%. Even if you’ve got a crazy number of items in your apprentice queue, it will peg the display at 100%. The straight up (50%) value represents a “normal”/desired level of difficulty.

Say you had 125 items in your apprentice queue (stages 1-4). Without any weighting applied the GanbarOmeter would display 62.5%:

125 / (2 * desired) = 125 / 200 = 0.625

But let’s say that within the 125 apprentice items, 8 were new kanji, 4 were new radicals, and 16 were new vocabulary items (where “new” means in stages 1 or 2). The adjusted GanbarOmeter display would be 72.5% based on this calculation:

   items in stages 3 & 4:    98 * 1.0   = 98
+  kanji in stages 1 & 2:     8 * 3.0   = 24
+  radicals in stages 1 & 2:  4 * 0.75  =  3
+  vocab in stages 1 & 2:    16 * 1.25  = 20
                            ---
          Apprentice items: 125          ---
                         Weighted items: 145


Displayed value = weighted items / (2 * desired items)
                = 145 / 200
                = 0.725

Eventually, I plan to (optionally) modify the displayed value even further by looking at upcoming assignments for at least the next month. Since reviews you do today can impact the workload up to 4 months in the future, it makes sense to look for days in the future with a large number of reviews scheduled. The point of the GanbarOmeter is to tell you when to slow down or speed up doing lessons: looking at the apprentice queue is just a leading indicator, the best way to do it would be to peer into the future.

To keep myself sane, however, and to have some hope of releasing the new version within my lifetime, I’m going to push that feature to a subsequent release. [Note to self: I can use @rfindley 's AWESOME Ultimate Timeline script to figure out a reasonable weighting algorithm for this. Just set it to look forward by SRS level for 120 days, then grab regions at 2 weeks, 1 month, and 4 months.]

(Cheers to @daikirai for pushing me to consider new radicals as lighter-weight than kanji or vocab.)

1 Like

Will you be rounding up or down to the nearest integer for the number of weighted items after the calculations? Just curious.

Also, I’m very happy that radicals will be weighted separately as I feel like I’m an odd case where new radicals are significantly harder for me than new kanji or vocabulary (doing the radicals for level 7 for example saw me making mistakes on half them regularly for an entire week; I just could not get them to guru. New kanji however only give me trouble once I get them to guru, then it’s constantly a process of “get to guru 1, make a mistake, go to back to apprentice, do well and get to guru 1, repeat”). So I’ll be making use of that feature to weight them more heavily haha

No reason to round: I’ll just do floating point math since the result is a float.

FWIW, I think you’ll be amazed with the results of more frequent repetition for radicals and kanji in stages 1 & 2. Mnemonics and effort are only half of the equation: sheer repetition is a huge, HUGE part of the process and I feel strongly that waiting 4 to 8 hours (or worse, 24 hours) for more repetitions of newly introduced (or recently returned) items is unnecessary (even counter productive).

In my opinion, you just want to get in as many reviews of new items as you can until they stick (the 4h and 8h intervals are an attempt to enable this, but unless you consistently review multiple times per day this doesn’t work). If, like me, you only do reviews once per day I think it’s mandatory to force yourself to do more reviews of early-stage items.

I highly, HIGHLY recommend trying my normal daily routine for a while:

Every single day I use @rfindley 's Self-study Quiz immediately prior to starting my “real” reviews. Most use this script for leeches, but I use it to go over the radicals and kanji in stages 1 and 2. Focusing on just those items repeatedly before starting my “real” review session provides serious dividends.

Don’t worry, I still miss them surprisingly often during the real review session when they are intermingled with other items, but the extra, focused reviews really help to stick things into my brain. The embarrassment of missing one of the new items when I’d just reviewed them a few minutes prior pretty much guarantees I’ll bear down and really cram it into my memory (creating a new mnemonic or whatever).

I use @prouleau 's awesome Item Inspector to launch the self-study quiz. Here’s how I configure and use it:

  1. In the settings for the Item Inspector, I created a table named “Current Apprentice 1-2”. I configured that table to filter on just radicals and kanji (ignoring vocabulary).

  1. After creating and saving that table in the settings for the Item Inspector, I then select the table and click on the “bullseye” icon until I get the tabular view. This view is sticky between sessions, thank heaven. The image below is what is currently displayed on my dashboard.

    (I only recently started seeing new items in level 40. That’s as many items as I normally ever see for my “pre-review”. Toward the end of a level, I often have several days in a row with no items whatsoever displayed for my “pre-reviews”.)

  1. All of the above is one-time setup stuff. Every morning I just scroll down to the Item inspector and glance at those items. I then click the computer-screen icon in the upper right to start the self-study quiz for just those items.

  2. I then quiz myself on all of these items repeatedly until I can score 100% correct. Only then do I close the self-study quiz and start my review session proper. (Use the pull-downs to study all items from the item inspector, again this is sticky between sessions.)

  1. I then rapidly try to answer all the items, almost always missing at least a few. Whenever I answer incorrectly, I hit F1 to review the correct answer (one of these days I’ll submit a patch to @rfindly so that I can it “f” instead of “F1” so it uses the same keystroke as normal reviews).

  2. At the end of the first “pre-review session” I almost invariably have missed at least a few, so I just hit “enter” to repeat a quiz over all the items (not just the ones I missed):

  1. I usually improve with each iteration:

  1. Eventually, I get them all right. It rarely takes more than 2-4 iterations to get them all into my head. Once I do, I just hit escape and start my review proper.

Believe me, it’s much harder to write all that than it is to actually configure and do it. It rarely takes me more than 0-3 minutes of pre-review each morning before starting my review proper, but those extra repetitions of early-stage items make the whole process SO MUCH EASIER!

Seriously. Try it. You’ll like it. Don’t be surprised if you see a launcher for the self-study quiz with these settings show up within the Ganbarometer.

2 Likes

And everything else you took the time to write…

This was SO helpful, thank you so much for explaining your process. I do only do reviews once, maybe twice a day, so everything you said about why high repetition of new items is so important rings true for me. I’ll be incorporating Self Study Quiz from now on. I already have Item Inspector but I just wanted a way to see my leeches when I installed that; this will give it more purpose beyond that.

One small problem I’m facing right now is that I got too used to the Reorder script on desktop and I really don’t like doing fully shuffled reviews; for me, being able to filter the type of items and order them for back-to-back mode is really important. I’ve been using the Jakeipuu app for quite some time and a lot more recently since the reordering it does still works after the recent changes (I don’t know if it has some sort of built in compatibility mode though; I don’t want to turn on compatibility mode on desktop until my piled-up reviews are done). This means most of the time I’m working on my review pile I’m not actually at my computer, so it will be difficult to do that studying just before reviewing. But I’ll figure out how to work around that.

I’m very glad to hear it! I believe that extra reviews of early-stage (1 and 2) items is extremely helpful and isn’t cheating at all. People think that an SRS is only about testing you right before you forget something. That’s part of it for sure, but that mostly applies to the later stages after you’ve “memorized” it the first time. I believe strongly It can be argued that you’re cheating yourself if you review later-stage items before they are due, but items in stages 1 and 2 are fair game: you want to review them as often as possible until they move to at least stage 3 and later.

Believe it or not, despite all the scripts I’ve used (and now written!), I’m a bit of a purist when it comes to how WK operates. As the kids say, “sounds pretty sus!” — I know! I just like to take things apart and understand how they work completely (often making incorrect assumptions along the way). I love scripts that let me analyze the data I’ve created after two years of usage, but I’m extremely leery of any scripts that actually change the behavior.

The more I’ve used WK the more I’ve come to respect the fundamental decisions they’ve made about the content and how the system operates. I’m very leery of anything like reordering scripts that change the behavior of the system. I don’t even like adding user synonyms any more unless I’m extremely confident that it makes sense.

WK was developed by some very clever people and the system has been refined with feedback from many amazing people that came before us. The deeper I go the more I realize how much thought has gone into almost everything.

FWIW, the thing I don’t like about the self-study quiz is that it seems to use “back-to-back” mode exclusively (it shows both reading/meaning questions for an item before moving on to another). I strongly prefer seeing other items mixed in sometimes in between the two questions for an item.

The brain is a funny thing. This happens to me surprisingly often during reviews:

I’ll miss a question, say a reading question, then view the correct reading which triggers my memory for the meaning (or vice versa). After reviewing several intervening items, the other half of the question pair will come up for the prior item and I will have forgotten the answer completely.

This is a sure indication that I haven’t really memorized the item and I SHOULD miss it to let the system know I need more reviews. If I’d reviewed the item’s questions “back-to-back” I’d absolutely have answered them both correctly and the system would have no way to know I needed more reviews.

Everyone learns differently and has different preferences. Things like how many reviews you want to perform per day, how many apprentice items you can handle, etc. depend on personal preferences (and how much time you have to devote to learning kanji). But I think always using back-to-back mode for reviews is one of those things that might feel right (it’s definitely easier because you’re doing less context switching) but that’s bad for you in the long run.

Just my opinion, of course.

[EDIT Below]

I gave up trying to do reviews on my phone around level 7 or 8, IIRC. I made too many typing mistakes and really wanted a full keyboard and a big screen so I could focus. This force me to find a regular time to do my reviews, though, rather than whenever was convenient multiple times throughout the day.

Addressing the back-to-back commentary, you’re right that it heavily depends on the brain of the individual.

I feel that the merits or demerits of using it are mostly dependent on the individual’s working memory (based on what I learned from my Human Memory course during my uni studies of cognitive science). The default way Wanikani does things is great under the assumption of a lot of small reviews. But when the individual does only 1 or 2 large review sessions in a day I feel the default starts to break down as the pair of questions can be far enough apart that the individual is missing the item because it wasn’t repeated enough while in the window of their working memory.

For those new items in Apprentice 1 and 2, lots of repetition is key; but, the item gets into long term memory from repetition within working memory; that is, if you repeat the item a lot of times but each time you see it it’s been long enough to have left your working memory, then you aren’t getting it into your long term memory (as well as you could be).

In particular for my case, my working memory is heavily affected by ADHD; I’m very good at recall of a sizable chunk of information but only if queried shortly after (i.e. under 20 to 30 seconds from being told a lot of information I can recall and recount almost all of it with high accuracy, but once it gets beyond even just 30 seconds I have a tough time recalling most of it).

In the context of flashcard-like reviews of single items, this means for new items I cannot tell if I miss the pair question later on in a longer review session due to lack of previous repetition or because the repetition wasn’t within my window of working memory to get into long term memory in the first place. Thus back-to-back mode is pivotal to me getting items into my long term memory. It’s not so important once the item reaches Apprentice 4 or so to be clear, so I’ll turn it off at that point (and this is also why it’s important that I can filter/sort by SRS stage as well, so I can choose the mode to use based on the stage of the items).

That’s how I see things at least. It’s important for features like that to exist to accommodate the varying strength of working memory across individuals; it’s also important to have the self-discipline to recognize when the feature is hurting more than helping you, and I think that’s the harder part.

Interesting. It sounds like self-study quiz will be perfect for you to practice new items (getting them into short-term memory with back-to-back reviews) then the normal reviews (with intermixed questions) for the full review session.

Just a quick update on progress toward the next version. Not that anyone is pushing me for an update, but writing this stuff down as I go helps me to clarify the design and identify things that still need more thought. Still much to do, but at least the layout and fundamental logic is coming along.

Instead of “salting”/weighting the ganbarometer value based on recent misses, I’ve broken out accuracy stats into a separate widget. The ganbarometer only looks at items in the apprentice queue to calculate a “percentage of max difficulty”:

The screenshot above is still using fake data, but I’m getting closer to having the real next version working.

I’m still iterating on the design as I build it (including major changes like what widgets to show) but I’m increasingly confident that these are the most meaningful things to display. These displays are starting to feel “right” to me:

  • Difficulty (the ganbarometer): how many and which type of items are in early stages
  • Speed: the average number of seconds spent answering each question (meaning or reading)
  • Accuracy: the number of items answered correctly the first time (meaning and reading)
  • Workload: the number of reviews performed each day

The primary “knob” WK users have under their control is the number of lessons performed each day. Pretty much everything else depends on how easily they are able to memorize the information (some people need more repetitions than others).

The script still provides many settings (but with reasonable defaults) because every user has different preferences with respect to how difficult a workload they can handle before it becomes burdensome, how fast they want to go, or how much time they have to spend on WK.

The heuristic used by many is simply to keep around 100 items in the Apprentice queue. Honestly, this probably suffices for most people, but I feel the four widgets above provide more nuanced information and allow anyone to make better decisions on how many lessons to perform each day. I think it may especially help newer users who haven’t spent enough time on the site to develop a feel for when they are making things too difficult for themselves.

Notes on the new design

The statistical MAD technique to break strings of reviews into sessions is working extremely well. No more “magic number” setting to find the start of new sessions. (I do plan to have an advanced setting to override the maximum MAD, but the default of 2.0 should work well for anyone).

I plan to always retrieve and cache one week (7 days) of reviews. I’ll always present stale information from the cache first, and asynchronously update the displays when any new data is retrieved (a small spinner in the upper left will appear whenever newer data is being retrieved).

The input in the upper left lets you control how many of those days to analyze and display below (between 1 and 7 days). Once the most recent data is retrieved, adjusting this value will not cause any round-trips to the server. I’m trying to focus on performance and responsiveness. (I hate waiting for API responses to return before showing something meaningful!)

I’ll create both a light and a dark theme (with a toggle in the settings) as well as allowing users to override any of the colors used by any of the widgets. I’ll also allow users to insert the entire collection of widgets above or below any section currently in the DOM.

The ganbarometer will display green from 0 to 80%, yellow from 80% to 90%, and red above 90%. It will only look at items in the apprentice queue to calculate the value (with extra weighting for items in the first two stages, per my previous post).

Eventually, I do hope to look at upcoming assignments and fold that into the ganbarometer value as well, but the Wanikani Ultimate Timeline suffices for now.

By default, the speed widget will go from 0s to 16s at full scale. Values in the first and last quartile will turn the gauge yellow (values between 4s and 12s will display green, other values yellow).

By default, the accuracy widget will show green from 80%-100%, yellow from 70%-80%, and red below 70%.

The reviews chart will show the number of reviews per day, with the most recent day on the right. As discussed previously, the time of day to delineate a “new day” is a user setting (the default is midnight). Right now all values are displayed in green, but I’m hoping to make days with values significantly above or below the daily target highlight in yellow (it may be easier to just display the target as a horizontal line, though).

Clicking the “Data” nav element will reveal the “back” side of the display with tabular text:

Obviously, I’ve not gotten very far with the tabular layout yet, but the intent is to show the data that drives each graphical element:

  • apprentice items and weighting for the Ganbarometer,
  • stats for each session for the speed calculation (this will be a scrollable table as there may be many more sessions than days),
  • daily accuracy stats for the Accuracy widget,
  • and daily review stats for the workload chart (reviews/day).

As always, any feedback is more than welcome. I do think this version will be much more straightforward to use and understand. If you’ve been shy about mentioning any other desired features, now is the time to bring them up.

Onward!

3 Likes

Believe it or not, I’m ready for beta testers — just in time for Christmas!

I’ve still got quite a bit to do before releasing it for real, but it’s functional and I’d like to get more real-world usage before I publish the next version in a week or three. If you’re interested in helping me test (no programming expertise required) just send an email to rw@pobox.com .

I’ve changed (and hopefully improved) MUCH since the currently published version. Here’s what it looks like currently with my data:

Graph view

Mouse hover over Monday’s data

Data view

Note that the Speed section gives question accuracy for each session (the percentage of questions, reading or meaning, that were answered correctly) while the Reviews section gives item accuracy (the percentage of review items where both reading and meaning were answered correctly the first time).

Settings

TODO

  • Much CSS work (especially the settings form)
  • Much refactoring and code clean-up
  • More tests
  • Range (slider) for days to retrieve?
  • use CSS variables everywhere (use all settings values incl. tz offset)
  • add warn/error ranges to gauges
  • Link to Self-Study quiz for new items
  • Write a new version of review-cache

Merry Christmas!

3 Likes

If you’d like to try this beta version:

  1. Navigate to your dashboard, click on the tampermonkey settings, and disable the current ganbarometer script (there is no point in running both).

  2. Open the tampermonkey dashboard, click the “Utilities” tab. At the very bottom you will see “Install from URL”. Cut and paste this URL into the box, and click “Install”: https://raw.githubusercontent.com/wrex/ganbarometer-svelte/main/published/beta0/bundle.js

  3. Click “Install” on the next page. Then navigate back to your dashboard and refresh the page. You should see the new version shown in the screenshots in the previous post.

While this isn’t the final version, and there will be at least a few cosmetic changes coming as well as some functional changes, everything should basically “work”. Let me know if I’m mistaken!

Notes

  • The source code is at GitHub - wrex/ganbarometer-svelte if you’d like to review the code before installing. The raw bundle is minified so I can’t publish it on greasyfork (it’s against the rules). Because it’s compiled svelte code, the unminified version isn’t really any more readable, just a lot bigger. Ultimately, you’ll have to trust me that the code doesn’t do anything malicious (tbh, this is true with any script unless you read the source code VERY carefully).

  • This version only does very basic caching and it doesn’t cache the raw reviews at all (only the processed data). The very first time you run it you’ll have to wait for it to fetch reviews before showing anything interesting (this can sometimes take 30 seconds or longer, unfortunately). Subsequent times should show prior data immediately while it fetches new reviews (though it still throws up an annoying modal while it retrieves the new reviews).

Notes for programmers

If you want to examine, compile, and run the code yourself:

  1. Download the source from GitHub - wrex/ganbarometer-svelte

  2. Run npm install to install all the dependencies for compilation.

  3. Before compiling or running the code, you may want to type npm run test. All tests should pass.

  4. In one shell window, type tsc -w to run the typescript compiler in watch mode.

  5. In another shell window, type npm run dev to compile a (un-minified) dev version of the code and prepare for “live” updates.

  6. Copy the top several lines of the file ./dist/bundle.js. Just copy the header itself, everything through and including the // ==/UserScript== line. Don’t copy any actual code.

  7. In the tampermonkey dashboard, click the “+” tab and paste in the headers (again, just the headers) from step 6. Save the file. This will install the ganbarometer-svelte ->dev script and prepare it for “live” updates. If you browse to the WK dashboard, and enable this version of the script, any changes you make to the source code should show up when you refresh the page.

This is still BETA code: I plan a fair bit of clean-up and refactoring. Please be kind (I’m just an amateur) but any thoughts or comments are quite welcome. Hopefully, it isn’t too hard to figure out the current code organization. It’s definitely FAR better code than the currently published version of the script.

4 Likes

Just a quick note:

The beta version now uses a range-slider to set the number of days to retrieve and has a launcher for the self-study quiz with just the new kanji (I’ll have settings shortly for what to include in the self-study quiz). The icon in the upper right for launching looks pretty janky at the moment — I’ll make something that looks better eventually. Besides some look-and-feel/CSS stuff, the main remaining work is:

  1. The settings form layout, adding some things, and wiring everything up so it works.

  2. Caching reviews. I’ve just started working on this. My hope is to eliminate the annoying popup from loading reviews.

If anyone installed the beta version as soon as I posted it, I made one implementation change to how the number of days is stored that might have broken things for a minute. Just move the slider to a different value and refresh the page and it should clear up.

Current look:

Just checking in on what’s supposed to be working in the beta version. Currently changing the theme/colors does not work fully. The top bar changes but the rest does not. Also the icons for the settings and self-study quiz buttons do not change font color.

The settings dialog itself also does not change for the theme selection.

Another thing just to mention, I believe the settings should be launchable from the dropdown when you click your profile picture in the top right, under Scripts > Settings, but it is not there.

Lastly, and I do not know if this was meant to be worked on at all, changes in the settings still require a page refresh instead of being applied when you click Save. I’d also like to see the Save button disabled while no changes have been made if that’s at all possible, so that there is feedback to know when the settings have been applied.

Other than that I like the outward changes introduced in this a lot. I look forward to the finished product.

Yup. Almost none of the settings stuff is currently wired up or working (you’re stuck with defaults in the beta for the most part). Working on it right this minute, in fact.

Sneak peek at look and feel of the Settings dialog:

I’m writing my own settings dialogs (mostly for the sake of learning how) rather than using WKOF’s settings facility. That’s why there’s a separate settings icon rather than a link in the dropdown with other WKOF scripts.

This is definitely the plan. It will also close the modal when you save. I’ve (rather obviously) not styled the Save or Reset buttons yet (the latter will probably be named “Defaults” and will be styled as an obvious secondary button).

Good to hear. I’m rather happy with the functionality myself. Please let me know if you find any issues with the behavior using the default settings. Your 3% Ganbarometer makes me a little nervous! :slight_smile:

I’m avoiding using scripts for now, but you get a like and a reply just for the name of the script itself! :rofl::+1:

1 Like

FWIW, I’m biased against using scripts that change any WK behavior (their UX and overall design is so well thought out it blows me away).

BUT I’m a big fan of scripts that just provide extra information, potentially letting you know when you might want to alter your own behavior. Scripts in this category include this one, @Kumirei 's heatmap, @rfindley 's Ultimate Timeline, and @prouleau 's Item Inspector, for example.

The only part of this script that goes right up to the edge is allowing you to launch @rfindley 's Self-Study quiz to study recently introduced Kanji “out of band”. It’s optional, though: the icon only shows up if you install that script.

P.S.

My daughters enjoyed the name of the script, too. :slight_smile:

1 Like

I’m not sure how you calculate/measure the ‘difficulty’ setting, but I have a potential suggestion you may like to try out; perhaps it might be a bit more intuitive?

Instead of ‘percent’ of difficulty, where (I’m guessing) the user would ‘typically’ be seeing the gauge in the 40-60% range, which might seem a little ‘pessimistic’ if that’s more-or-less ‘the expected difficulty’ anyway: You could instead shift/transform the scale from 0-100% to -∞ to +∞, with the ‘middle’/average being 0. (Instead of +/-∞ you could just use a ‘large’-ish number, like maybe +/-10 or even +/-5.) You could use the ‘probability to log-odds’ (aka logit) transformation, where if p is a number from 0 to 1, then the logit z (because x and y are cliche!) is:

z = f(p) = log(p / (1-p)),
or, equivalently,
z = f(p) = log(p) - log(1-p)

You’d probably have to provide a cut-off for probabilities very close to 0 or 1, just to avoid dealing with actual ∞’s. E.g.:

if p < low_cutoff then return min_value
if (1-p) < low_cutoff then return max_value
else return log(p) - log(1-p)

And you’d adjust the scaling by multiplying or dividing by some suitable scaling constant.

You can imagine that it would work something like a decibels scale, and you could adjust the scaling (and min/max values) to make the sensitivity pretty much whatever feels ‘right’. Example, for a really simple scale, you could have -2 representing ‘super easy’, -1 as ‘easier than normal’, 0 being ‘average/normal’, +1 being ‘harder than normal’, and +2 being 'DANGER Will Robinson! DANGER!

The main idea is to have 0 being ‘normal’, and normally it would be in some small range around 0, + or - some reasonable looking number. I’m guessing that using percentages is less intuitive: Like, “Percentage of what, exactly? :thinking:

[BTW: The inverse/reverse of the logit transformation is the probability (or just think of a ‘ratio’ or ‘percent’), p:
p = g(z) = 1 / (1 + exp(-z))
where exp() is the exponential function. Again, you may have to rescale your z to exactly undo your logit transform if you scaled that.

You can potentially avoid using this inverse if you just keep your original calculation as a ratio/probability/percent and only use the logit(p) for translating that p for a more natural display value for the UI.]

1 Like

My man! I’ve found another of my people! <laugh>

I really like (and will almost certainly adopt) the zero center gauge idea. I need to ponder the final algorithm, but it does indeed remind me of a VU meter:

2 Likes