Upcoming WaniKani downtime

Hello everyone,

This is a heads up we’ll be putting WaniKani under maintenance mode on 2023-01-20T08:00:00Z2023-01-20T12:00:00Z, up to four hours.

We are hoping the maintenance process will be done in less than two hours, given our results from performing a few dry runs. We are giving ourselves a buffer just in case the maintenance does not go as planned. During this period the website and the public API will be inaccessible.

We apologize for any inconvenience this may cause and thank you for your understanding.

We don’t take these long downtimes lightly and do our best to avoid these. Fortunately this type of downtime has not been a common occurrence for us. Unfortunately, this upcoming maintenance and resulting downtime is unavoidable.

The WaniKani Community boards will still be up and running during the downtime period. If you wish to participate in the community boards (for example, access the private boards or post), please ensure you are logged into the boards prior to the start of the maintenance period. Else, the public parts of the board is still accessible for your reading. The boards can be accessed at https://community.wanikani.com.

Again we apologize for any trouble this maintenance period may cause you.

50 Likes

I do my reviews almost every hour, I’ve got a quite strict pattern to avoid huge reviews piles per hour. With the fast levels I’m in it’s sometimes 400 reviews per day so I need to spread them out.

Lets say WK is down from 9am to 12am and I do reviews in Flaming Durtles at 9am, 10am and 11am. At 12am WK is back online.

When FD then syncs with WK, will it sync the hours I did the reviews at correctly or will WK then put all the reviews I did at 12am?

3 Likes

Your reviews aren’t going to be frozen during the downtime.

For the scenario you shared, the reviews that open up between 9AM to 11AM will be available at 12AM, alongside the reviews which open up at 12AM.

4 Likes

In my experience, reviews done on FD while WK is down or you have no network are synced with the actual time of completion.

WK’s trusts the API client to provide the correct time as the only one you would cheat by lying is yourself, which allows FD to work offline (I haven’t tested it but you should be able to go a full day without coverage and properly review your morning lesson at the same times you would have if the sync happened)

6 Likes

They will be delayed I believe, FD can’t tell WK that the reviews happened “in the past”, they’ll be taken into account only when sent to the server.

It’s mildly annoying for me as well because my radicals happen to reach guru just at the beginning of this window…

Are you sure? I have the opposite experience. I sync my progress at the end of the review sessions (because I like to undo like a scrub) and if I start a review session at the end of one hour and finish it at the beginning of the next, WK considers that all the reviews were done in the 2nd hour.

That being said the API does seem to let the client specify the review time so maybe there’s something else going on.

1 Like

I think that FD uses the time you end each review session for all reviews in the batch if you activate “Lesson/review settings” > “Delay processing quizz results” to be able to undo as much as you want.
You could have smaller review batches (I like doing 10 at a time personally) to have more items validated during the first hour.

3 Likes

Thanks, that’s what I was hoping for.

Aaah, that would explain it. I do batches of 20 so it usually takes me a little while to go through.

That’s good news for friday then.

The API lets you set the completion time of the review. And this completion time will be used as the base for calculating the next review time using the SRS timing. The assignment.available_at is then updated with the calculated next review time.

There is some validation with the review completion time, if I remember correctly. Not an exhaustive list.

  • Completion time must be in the past
  • Completion time must be greater than the subject’s current assignment.available_at.

For example, the API will accept a review completed a week ago, so long as the POST body of the review meets the above criteria. And the next review time will be calculated using the “week ago” completed at timestamp as the base.

Based on the above example, if the SRS timing calculated is one day from base time, then the subject would immediately appear available to be reviewed again due to a week already having been past. The assignment.available_at will be timestamped with the value of base time plus one day.

I don’t know how FD submits reviews to the API or how FD’s offline mode works, so I can not comment on that.

14 Likes

Former sysadmin is curious: watcha doin’?

7 Likes

Fun major version upgrade of the database times. Got to do it before our PaaS provider’s EOL date else everyone here will be real sad :frowning:

6 Likes

Backup, backup, backup.
Please, for me, have a redundant backup.

Thank you

3 Likes

We have backups. And the upgrade is being done on a follower. We’ll have the primary to fall back to if things go sideways :slight_smile:

11 Likes

Sounds good to me.

But then why don’t you do the upgrade on the follower, resync the db, and do a switchover?

1 Like

We are following the recommended and available options/process our PaaS has for a database upgrade.

If you are interested in reading about it, here is the link: Upgrading the Version of a Heroku Postgres Database | Heroku Dev Center

If this was a more common occurrence we would spend resources on engineering a more controlled system with zero down-time option. But this event is something that occurs roughly every four years.

11 Likes

I did not expect Heroku, haha. (I love selfhosting)

But yes makes sense, but for future reference:

  1. Provision a new replica DB (DB_A) with the old version of Postgres.
  2. heroku maintenance:on
  3. Change DB hosts’ IP addresses in “/etc/hosts” of online servers to use read-only replica DB (not DB_A). By this moment, all write operations will fail.
  4. Run pg_upgrade (with “–link”) on DB_A to upgrade to the new version of Postgres, and promote DB_A to be a primary.
  5. Replace all DB hosts’ IP addresses in “/etc/hosts” of online servers to use DB_A. By this moment, write operations would resume.
  6. Re-provision new replica nodes with the new version of Postgres.
  7. heroku maintenance:off

This above is just out my mind, please correct me on any mistakes.

By the way Heroku also have some docs for a follower switchover:

PS Of course dont switch the procedure now when you already did some dry runs without problems.

Anyway good luck.:hearts:

Our change over process follows what Heroku suggests.

The roadblock from having this done with zero downtime is their upgrade-in-place option. It requires stopping writes to the primary database, which means putting the app in maintenance mode while the upgrade process takes place on the follower.

Heroku’s Postgres upgrade process takes a while to finish (which is where all the downtime is happening) and when its executed on the follower, the follower detaches from the primary.

As far as I know Heroku doesn’t have a convenient process to be able to sync any new changes from the primary to the detached follower. So having the primary actively receiving new updates while the follower upgrades is not an available option.

There may be an option working with Heroku’s process to truly achieve zero downtime; we just haven’t figured it out yet.

But yeah… All boring details. Knock on wood the upgrade goes well and we won’t have to revisit this for another few years.

3 Likes

Well as I understood it from the docs it would be as followed:

  1. Create new Follower Database
  2. Enter Maintenance Mode
  3. Update Follower Database
  4. Promote Follower Database
  5. Exit Maintenance Mode

Disclaimer: never used Heroku, but this would be my way of doing it based on the docs.

Yep. Thats exactly our process we are following. It is step 3 where the downtime is happening. Can’t do step 4 until step 3 completes :slight_smile:

3 Likes

Yes but I don’t think that the downtime would be 2+ hours for that right?