[Userscript] The GanbarOmeter

It now appears that any speed-up from removing the @require was purely imaginary on my part. I think I was fooled by server-side caching effects.

It still takes 20-30 seconds to retrieve 3 days of my reviews if the script hasn’t run in a day or so (it feels like an extremely l-o-o-o-n-g 30s, too). I’m not sure how often the server purges its cache, but the script definitely loads data much more quickly sometimes than others. Based on loose observation, I think the server expires the cache for an individual client after around 4 to 8 hours, but it may also opportunistically free up cache based on load, memory pressure, or whatever.

Regardless, as I mentioned earlier in the thread: I do plan to reintroduce client-side caching in the “NEXT MAJOR UPDATE.” I definitely want to share the cache with the heatmap script in particular (as well as any other scripts that cache the reviews endpoint) so I will use @Kumirei 's review-cache module if at all possible.

I’m toying with the idea of an asynchronous thread to conditionally check for new reviews periodically (say every 5-10 minutes, possibly even on review pages, not just the dashboard). That is, the script just throws up whatever is in the client cache to quickly display meaningful (though out-of-date) information and the asynchronous thread can then refresh the widgets whenever new data is found and retrieved.

If I understand correctly, you want to show 2 digits of precision only on the intra-session average printed next to the “Review Interval” heading, not on the bar labels, e.g. “14.37s” or whatever instead of just “14s” here:

[By the way: I don’t think toPrecision() does quite what you think. It would limit it to two “digits of precision” (i.e. 14s, not 14.37s which has four digits of precision). I think you mean something like Math.round(100 * varName) / 100 to get two digits after the decimal point.]

I’ll consider a user setting in the next major update, though I’ll likely make it yet another user setting and keep the default to 1s granularity. I don’t think most users will care about at most 500ms discrepancies, especially since the API does not provide “real” question-to-response times and I have to use question-start-to-question-start times as a proxy.

In other words, I don’t think it’s possible to get what you’re after anyway. The time between answering a question and moving on to another question can be quite significant even if you’re plowing through reviews quite quickly (certainly on the order of 500ms when you consider network latency and rendering time).

Well, yes. The number of intervals between items in any sequence is always one less than the total number of items in that sequence. (I’m with Arthur C.: the current millennium started on 1/1/2001 and not 1/1/2000! :slight_smile: ) But intervals are conceptually different things than reviews or sessions.

I do agree that showing “Average 14s” in the same chart is misleading, though, since the average is intra-session and the Pareto is for all intervals including those between sessions.

I disagree about the solution, though: rather than changing what’s displayed in the Pareto, I think it’s better to move the display of the intra-session average. After moving from a gauge to a (pseudo) Pareto chart there wasn’t any obvious place to put the average, so I got lazy and just slapped it wherever I could find room.

The next major release will probably move it into the text at the bottom and label it something like “average intra-session response time”.

I think our disagreement boils down to this:

  • You want the Pareto to only show response times within sessions, not intervals between sessions. You will, however, accept intervals between reviews as a proxy as long as inter-session intervals aren’t displayed.

  • I want the Pareto to show intervals between all reviews for the past 72 (or whatever) hours so that I can visually decide on a reasonable value for the maximum intra-session delay between reviews.

I really really wish it was possible to show true response times, but WK does not provide that information in the API (only review start times).

I could hide intervals greater than the user-specified intra-session max as you suggest, but I expressly choose not to as I find the information useful.

In the chart above, there are 3 reviews in the 1.5m bucket and the 2 in the 2m bucket — these interest me. There are several possibilities for each of these:

  • The 1.5-2 minutes might have been mostly consumed by my pondering before providing a response (possible for one or two of these, I think, but unlikely in my case).

  • I answered these much more quickly, but got the answer wrong and spent some time figuring out why (re-reading the mnemonics, adding notes, looking up other items I’d confused it with, or whatever). I suspect this was the case for some or all of these five items (but it isn’t easy to check).

  • I answered these much more quickly, but got distracted before going on to the next item: a phone call, refilling my coffee, noticing a reply on Wanikani Community (!), whatever. This is also quite likely for some or even all of these five items.

  • I explicitly ended my review session by clicking the “home” icon on the review screen to go do something else, but then had a change of heart and decided to complete my reviews within just a minute or two (again, possible but unlikely in my case).

The example above used the default intra-session max (10 minutes) and thus had just three sessions. I think a reasonable value for intra-session max might be 1.5m in this case, though. This would create five more sessions, for a total of 8, and would exclude 7 intervals from the “average intra-session response time” (decreasing the average, if only slightly).

If I followed your suggestion, all 7 items in the final 5 buckets would not be displayed. You would only know that there were 8 sessions over the past 72 hours. You would lose the information that 3 of the sessions were only separated by 1.5-2m, and 2 were separated by 2-5m. I find that unacceptable.

If true response times were available from the API there would be all sorts of interesting things I’d like to examine. I know I’m unlikely to allow an interrupt before answering a question, but I’ve no qualms at all about taking interrupts between questions. I’d really like to exclude the delays between answering and moving to the next question in my widgets, but it’s simply not possible without (major) changes on the WK end.

That’s great to hear! It seems to be working pretty well for me, too. I recently started having some difficult review sessions where I missed more items than usual and it was an easy decision not to do any more lessons despite only having around 76 Apprentice items (because my difficulty gauge was leaning to the right).

You’re quite welcome!

In the (admittedly, increasingly mythical) NEXT MAJOR VERSION, I plan to make these improvements:

  • Currently, I retrieve reviews that started precisely 72 hours ago (by default). This means the first session will often be an incomplete session: the first review retrieved may very well be in the middle of a session. Instead, I plan to throw out the first (possibly incomplete) session if the first review’s start time is within intra-session-max of “72 hours ago”. I may also increase the default from 72 to 84 hours (and eliminate the 24 increment requirement).

  • I will provide more statistics. In addition to the intra-session average, I will (optionally) display things like min, max, average, median, and standard deviation (at least) for:

    • intervals between individual reviews within a session,
    • session lengths,
    • and intervals between sessions.

    I may eventually decide to automatically determine the default intra-session max (maybe with a rule like, “any review interval more than 2σ from the prior review starts a new session”). I need to make the info available first to see if that’s a good idea.

  • I will make the buckets in the bar chart clickable to see which reviews (for which subjects) fell into that bucket.

  • [+Edit] This is some serious lily-gilding, but it would be extremely interesting to display the percentage of incorrect answers in each bucket.

And now I’ve procrastinated enough. Time to do my reviews!

3 Likes

Amazing script!
As a suggestion, radical review weighting would be great. Radicals take much less effort (at least from my perspective) than other items. It would be nice to count them as, for example 0.7, part of an entry.

1 Like

I like this idea. I’m still working on the next version (the kitchen sink version). I will add it.

2 Likes

I’ve been thinking quite a bit about @Lupo_Mikti 's desires for using the bar graph to glean meaningful info about the intra-session response times.

Unfortunately, I don’t think we can draw any meaningful conclusions from the interval data OTHER than whether a review was within the same session or not. (I created a bit of false hope, I think, because of my fuzzy understanding and thus incorrect prior explanations about review recordsI.)

Say I had just three kanji to review, 1:(だい), 2:(しょう), and 3:(すい). There would be at least six questions and responses for these three items. The review session might look like:

  1. I correctly answer the meaning for item 1: “big”.
  2. I correctly answer the meaning for item 2: “small”.
  3. I correctly answer the reading for item 1. => Creates a review record for item 1 (0 incorrect).
  4. I incorrectly answer the reading for item 2. (Doh! Forgot it’s しょう and not しょ!)
  5. I correctly answer the reading for item 3.
  6. I correctly answer the meaning for item 3. => Creates a review record for item 3 (0 incorrect).
  7. I correctly answer the reading for item 2. => Creates a review record for item 2 (1 incorrect reading).

That’s actually seven questions and answers, but only three records are created. I’d love to know how long it took to provide an answer to each question, but that information is (understandably) not provided by the API. Worse, the intervals between the review record creation times really don’t tell you much.

The shortest interval might be between steps 6 and 7, for example, even though item 2 was the one you had to go through three Q&A cycles for since you made a mistake. It’s conceptually possible that in a 150 item review session, the two halves of the review might be separated by ALL the intervening items and mistakes!

In the mythical next version, I’m planning to just use the standard deviation of times between review records to figure out which reviews are in the same session, say anything greater than 2σ indicates a new session, though, rather than displaying the bar graph to let the user decide.

I’m still deciding on what information I want additional information I want to display (and how to visualize it), but I think it’s going to make MUCH more sense to focus on throughput (reviews-per-day and reviews-per-session) than on interval times.

All hope is not lost, however, as it’s conceptually possible to create a script that runs during reviews (not on the dashboard) to track and display actual response times (possibly for subsequent analysis). Such a script may already exist for all I know!

Tl;dr - The “pareto” of interval times will likely be removed from the next version of the script. It’s unnecessary and likely misleading.

[To be clear: I will still provide at least the mean and standard deviation for intra-session interval times, but likely not in a Pareto-like chart.]

Also, I just realized that we can actually calculate the number of questions within a session (or at least a close approximate) by summing the number of incorrect answers:

#questions = R + 2(K+V) + I

where R is the number of radical reviews, K the number of Kanji reviews, V the number of Vocabulary, and I the total number of incorrect answers (meaning or reading).

Given the number of questions and the total time spent in a session, I can at least come up with a pretty good approximation of the average response time. Previously I was displaying the average interval between review entries within a session. That was a pretty meaningless value, but the average response time (how long it took to go to the next question on average) should be much more useful!

2 Likes

Believe it or not, I’m still making progress on the next version of this increasingly insane feature-rich user script. It’s taking forever because I’ve been forcing myself to learn a pretty ridiculous amount of stuff (svelte, typescript, jest, testing-library, rollup, a11y, figma, advanced CSS stuff, data modeling with mswjs/data and faker, wkof/APIv2, basic statistics, ad infinitum).

It will still be several more weeks (months?) before I complete the implementation, but because I’m making some fairly substantial changes I wanted to get some feedback before I get too much further. I think the changes all make sense, but it would be good to get some feedback.

PLEASE COMMENT IF YOU HAVE THOUGHTS ON ANY OF THIS.

The new version will have two display modes: charts and data. Here are some lo-fi mockups (with made-up and mostly meaningless data values):

Planned user-visible changes:

  • Stale (cached) information is displayed immediately. Whenever new reviews are discovered, a spinner is displayed in the upper left-hand corner until the new data is retrieved and parsed, then the display is refreshed. My goal is to have a helper function that runs on the WK review and lesson pages as well as the dashboard page to check periodically for new reviews.

  • You can now display info for 1 to 4 days of reviews. By default a day starts at midnight. Setting the time span to 1 day retrieves reviews since midnight local time. Setting it to 2 retrieves those same reviews plus the prior 24 hours. For night-owls that frequently have review sessions spanning midnight, you can choose what time to use as the start of a new day.

  • What was the “Difficulty” gauge is now the GanbarOmeter proper. It still shows a percentage from 0 to 100%. “Straight up” (50%) still represents the desired level of difficulty. As before, the value is mostly based on the total number of items in Apprentice stages with user-definable weighting for items in the first two stages (positive or negative weighting). One major addition is that I’m also using future assignments. Too many future reviews-per-day will increase the difficulty displayed by the GanbarOmeter. (This gets to the heart of whether to slow down on lessons and is indirectly affected by incorrect answers, so I no longer consider “excess” recent misses.)

  • Reviews/day is now a bar chart rather than a dial gauge (one bar per day). The real version will show both the the total number of reviews and the percentage answered correctly the first time each day (as a stacked bar graph).

  • As described above, the pareto-ish chart of intervals between review records is misleading at best so I’ve removed it. I’m now presenting a simple gauge of the number of seconds per question instead. I believe this is much more meaningful to track. The number of questions for each kanji or vocabulary item is two plus the number of incorrect answers (reading + meaning + repeats).

  • I’m using statistics to find the breakpoints between sessions. There is no longer a user setting for the maximum interval between reviews in the same session.

  • I now handle the settings form styling and validation directly instead of handing it off to wkof.

Developer-oriented changes
  • My planned algorithm to find sessions uses the “median of absolute deviations” as described brilliantly in this article I finally found by Hause Lin. I’m using a threshold of 2.0 deviations to define the maximum interval.

  • I’m currently just using wkof.Apiv2.fetch_endpoint() to fetch reviews without caching. While I plan to add caching eventually, I’ve reluctantly decided not to use @Kumirei’s Review Cache for the following reasons:

    • I don’t need to retrieve reviews from more than a few days ago. Retrieving all cached reviews is somewhat inefficient in memory and file I/O since I only need a relatively small number of reviews at the very end of the cache.

    • Since Review Cache uses wkof.file_cache the reviews are stored locally as a single, fairly large blob. To retrieve even a few days of reviews necessitates reading the whole thing.

    • This is just a minor irritation, but get_reviews() returns an array of arrays rather than an array of objects. Since the browser itself compresses data stored in indexeddb I’m unsure if this really saves any space. Using symbols to index is almost as readable

    • If @Kumirei isn’t opposed, I may eventually write a drop-in equivalent to Review Cache that uses IndexedDB directly rather than via wkof.file_cache. Additional functions beyond get_reviews and reload would allow indexing by creation-date/subject_id/item_type/srs_stage/etc as well as retrieving ranges of reviews from the cache instead of the whole thing.

  • Since wkof doesn’t cache the reviews endpoint, I looked into directly accessing the API and caching with indexeddb (likely with localforage or Dexie) but there are still a number of advantages to using wkof (pagination, API key handling, consolidating requests from other scripts, etc).

  • I was derailed for an afternoon learning about screen readers and accessibility. It’s a fascinating topic but I eventually realized that there probably aren’t many sight impaired Wanikani users! I was fascinated to read about Braille Kanji however.

Onward!

3 Likes

I like all of these changes! My primary concern was with the metric of reviews/day and whether that would match the value that is given to us by Heatmap especially if we change the time of a new day; but, you took care of the second part of that by also allowing us to set the time so I am very pleased to see that. As long as the result stays consistent with Heatmap I think it will be very nice to have in chart form. I also appreciate the ability to toggle between charts and data.

Once again, thank you for all of the time and effort you’ve put into this script! It still remains one of my favorites.

1 Like

I don’t mind at all. I made the review cache as simple as possible for my own needs with the Heatmap. I could have made it better, but publishing it as a separate script was more of an afterthought. I just thought that since I will be storing the data anyway I might as well share it.

If you do end up making your cache I would consider switching to your version, so that users don’t need duplicates caches if they use both scripts.

2 Likes

i like this script, and all the changes you’re implementing! ^^

what i would perhaps suggest is making the configuration for difficulty user accessible? you mentioned further up that one could configure that by editing the script, but some people (me) don’t like messing around with code much. i don’t know if it’d make sense, or be doable with a reasonable amount of work, but it might be nice ^^

1 Like

Yes this! The cache startup time adds several seconds of annoying latency to the dashboard startup. Duplicating the cache will likely double this latency.

1 Like

Great! No promises on timing (I keep volunteering myself for things) but I will make get_reviews behave identically by default (returning an array of arrays). Existing scripts like heatmap should only need to change the require line in the header and nothing else.

I wouldn’t create an alternate cache unless you were willing to use it. A review cache is too big not to share!

I will let you know when I have a version ready to try (I will test it with heatmap myself beforehand).

De nada. This script has turned into a very fun learning experience for me.

Oh, absolutely! That little gear icon in the upper right of the mockup is a portal into a bunch of fun settings. Everything will be configurable, especially the “magic numbers” used in the ganbarometer calculation.

Agreed. This is precisely my motivation for proposing a new version of the review cache as well as for implementing a “show stale until refresh” strategy. The startup delay is getting a bit annoying, but mostly because of the “Loading…” modal that wkof displays (most people won’t use any part of the dashboard until that goes away).

I think most of the delay is due to occasionally slow responses from the server. The server appears to respond very slowly for the first request to the reviews after a long delay, but then it responds quickly to subsequent requests shortly thereafter. So my theory is that it’s due to server-side caching.

Unfortunately, an upgraded review cache won’t help with server side delays. I need to look into it and discuss with @rfindley , but I suspect the global ‘Loading…’ progress bar that wkof.Apiv2.fetch_endpoint() throws up is the main problem. It would be better to just let the widget requesting the data to refresh itself when the promise resolves rather than throwing up an annoying modal. It doesn’t really block the user from using the rest of the dashboard, but the “in your face” nature of the modal is distracting at the very least (I know that I impatiently wait for it to go away every time before even starting to use the dashboard).

I suspect it should be up to the widget making the request whether put up a modal rather than having wkof do it unilaterally (just as it’s the widget’s responsibility to decide whether or not to block). The reviews endpoint is kinda special, though, since it contains so many records (that never change but only get added to). So it may make more sense to call the reviews endpoint via the API directly rather than going through wkof at all. I’d rather not request behavioral changes to wkof itself unless absolutely necessary.

On the other hand, the review cache currently reads and writes the entire “file” in indexeddb in one swell foop (via wkof.file_cache.load()) and then compresses/decompresses the records in a loop, I suspect that also causes some delays that could be avoided with a modified version of the cache.

2 Likes

Please keep in mind that every @require directive causes a different copy of the script to be loaded in memory. These copies will execute independently and may clash when they access the indexdb cache.

Interesting. I think you’re referring to my comment about accessing the API directly rather than through the wkof.Apiv2 interface (which corrals requests from multiple scripts)?

Can localstorage be used for a semaphore?

[Or does [Review Cache] already present a concurrency issue since it’s intended be @required?]

[Mostly a note to myself: It appears that localstorage’s setItem() is not atomic and does no locking. If I do ever implement a mechanism to cache reviews on the WK review pages as well as the dashboard, I’m unclear about the best way to handle concurrency, but listening for a storage event and sending channel meesages appears to be necessary. Ugh. Not sure if the lack of atomicity with setItem() is a real concern with this use case.]

Why do you think it is not?

Why should it run on the review and lesson pages if The GanbarOmeter is just on the dashboard? Or do you mean that you want to make sure that your review cache library script can theoretically also be used by scripts that run during lessons or reviews?

I had in mind any operation that could cause a race condition. In particular I was thinking of the initialization code that reads recent reviews to populate the cache. That could cause the same reviews to be populated multiple times.

I don’t know if the review_cache code has concurrency issues but knowing how good @Kumirei is at coding I believe their code already handles concurrency properly. You may check this code to see how it is done,

For a semaphore a test and set of a variable in the global name space would do. Javascript is always single thread so code tend to be indivisible unless you write promises. A variable in the global scope will be visible in every instance of your caching script. You test if it is initialized. If not then this instance is the first and is allowed to proceed after initializing the variable. If so you abort the function because you want only one instance to run.

You may also need to test in the one true instance is finished to initialize before allowing the other copies to proceed. You can use wkof.set_state and wkof.wait_state for this. The instance that initializes issues a wkof.set_state('ganbarometer', 'ready') call to signals is is done. The other instances issue a wkof.wait_state('ganbarometer', 'ready') call to wait for the initialization to finish before resuming their process.

1 Like

I don’t remember why, but I decided to only fetch reviews when they were asked for, so I don’t think there would be a race at initialization. Also review_cache is a global variable that only gets replaced if there’s a newer version of review_cache being loaded. Although, now that I think about it, I don’t think that is working as it should, because I probably used GM_info and that doesn’t work when it’s being required, I think

I was informed by professor Google. :slight_smile:

It appears that IndexedDB does provide atomic transactions, but localstorage does not.

My (often flawed) logic was that it would prevent overly stale data being displayed on the dashboard if I periodically retrieved newly created review records as they were created. In my specific use case, I’m curious to see an updated ganbarometer immediately after completing a review session.

This is probably a case of evil early optimization, but it would be nice if only a few reviews were missing from the cache rather than the entire session.

My thought was to have the script run on both the dashboard and review pages, but only update the DOM on the dashboard and update the cache periodically while on the review pages.

This is precisely why I’m looking for atomic transactions (I was a filesystem guy in olden times). Since reviews are “never” overwritten and only new ones appended, I think the main concurrency concern is ensuring the check for the latest entry and populating new ones gets wrapped with a lock (a mutex/semaphore).

If the script only ran on a single page this would work, but would it across multiple page refreshes?

I’d forgotten about these! I need to see what they are doing under the covers to understand if it will work while navigating between pages.

As it stands, it appears an IndexedDB transaction may be the simplest locking mechanism (whether using wkof.file_cache or not).

1 Like

Yes, the problem is that localStorage does not offer any way to do a combined read and write as an atomic operation. I agree that IndexedDB transactions are probably the best way to do this. I was just confused by your wording that localStorage.setItem() would not be atomic.

1 Like

Ah! I’d actually started to write that as get/setItem originally but it seemed overly verbose. :slight_smile:

Good point. I didn’t think of this.

I forgot to write about an important coming change to the GanbarOmeter gauge itself:

The weighting calculation for “new” R/K/V items will (hopefully) be much more rational and understandable/intuitive.

In the settings form you specify the following (default values shown):

Desired number of Apprentice Items: 100
Extra weighting for 'new' items (items in stages 1 and 2):
         Kanji: 3.0
      Radicals: 0.75
    Vocabulary: 1.25

The default values mean that kanji in stages 1 and 2 are three times “heavier”/harder than normal apprentice items, radicals are slightly “lighter”/easier, and new vocabulary is only slightly harder. You could set any of the three weighting values to 1.0 if you wanted to treat those items normally.

The GanbarOmeter still displays a value between 0 and 100%. Even if you’ve got a crazy number of items in your apprentice queue, it will peg the display at 100%. The straight up (50%) value represents a “normal”/desired level of difficulty.

Say you had 125 items in your apprentice queue (stages 1-4). Without any weighting applied the GanbarOmeter would display 62.5%:

125 / (2 * desired) = 125 / 200 = 0.625

But let’s say that within the 125 apprentice items, 8 were new kanji, 4 were new radicals, and 16 were new vocabulary items (where “new” means in stages 1 or 2). The adjusted GanbarOmeter display would be 72.5% based on this calculation:

   items in stages 3 & 4:    98 * 1.0   = 98
+  kanji in stages 1 & 2:     8 * 3.0   = 24
+  radicals in stages 1 & 2:  4 * 0.75  =  3
+  vocab in stages 1 & 2:    16 * 1.25  = 20
                            ---
          Apprentice items: 125          ---
                         Weighted items: 145


Displayed value = weighted items / (2 * desired items)
                = 145 / 200
                = 0.725

Eventually, I plan to (optionally) modify the displayed value even further by looking at upcoming assignments for at least the next month. Since reviews you do today can impact the workload up to 4 months in the future, it makes sense to look for days in the future with a large number of reviews scheduled. The point of the GanbarOmeter is to tell you when to slow down or speed up doing lessons: looking at the apprentice queue is just a leading indicator, the best way to do it would be to peer into the future.

To keep myself sane, however, and to have some hope of releasing the new version within my lifetime, I’m going to push that feature to a subsequent release. [Note to self: I can use @rfindley 's AWESOME Ultimate Timeline script to figure out a reasonable weighting algorithm for this. Just set it to look forward by SRS level for 120 days, then grab regions at 2 weeks, 1 month, and 4 months.]

(Cheers to @daikirai for pushing me to consider new radicals as lighter-weight than kanji or vocab.)

1 Like