API changes - Get All Reviews

If you do this, ActiveRecord (Ruby on Rails’s ORM) will fight you tooth and nail. One model = one table (partitioned or not). Anything beyond that and you’re completely on your own.

This is enough to have a User model that maps to a “users” table:

class User < ActiveRecord::Base
end

# now you can do fancy stuff
User.first
User.find_by(email: ...)
someUser.update!(email: ...)

you lose all of that if you decide to go with your solution.

I don’t even know what you mean by that.

SQL databases are not time series databases - maybe that’s where the confusion arises?

It’s really not a good idea and a lot of people agree:

1 Like

Okay, I see where the confusion is. What you and the links refer to is classical relation tables. I agree, having more or less static user data in separate tables with known/easy to anticipate dimensions just for the sake of it makes no sense.

SQL can be used to store time-series data, however. The timescaledb plugin in PostgreSQL, for instance, helps with that and behind the scenes uses partitioning by time interval blocks. Ironically, this is the bad experience with table partitioning I had, BUT here it is partially because the tables were partitioned by time interval ranges, not by specific columns (for instance, user IDs). We had way better experience with vanilla PostgreSQL time-series tables (indexed by timestamp) in the end.

Going back to what I originally claimed, I was wrong, because the user review data wouldn’t be strictly time-series data, since the only time-series part of records would be the initial/modified/.etc timestamp columns if that’s there at all and doesn’t have to be, since WaniKani review intervals are static.

So yeah, on a table which contains mostly static records, partitioning makes more sense, I agree. Where I would see using separate tables might make sense is either high volumes of data or privacy concerns, though not sure how relevant that is here.

If this is MySQL / MariaDB based then partitioning would most likely solve the problem here. The issue may well be just one of scale as I’m guessing that the table involved (all historical reviews) could well be their largest data set.

Challenge 1 would be the amount of downtime needed to partition the table when you don’t even know 100% upfront that it will solve the problem. If they’re on cloud resources then this might be hard one to pull the trigger on.

Challenge 2 would be working out why this problematic data is being kept when it doesn’t seem to be needed to support the main application.

3 Likes

Since the review data is historical write-once data they could also offload all this to something like Elastic. But maybe that wouldn’t fit into their API design since all the API data is probably coming from the same database currently.

3 Likes

(Not necessarily responding just to Iinchou. I just hit this particular reply button while reading the conversation.)

I think the main issue is that a single Reviews table query can return 100k to 1M records. It’s not just returning a single page of the paginated results, it’s often returning the whole result set. No matter how well indexed or partitioned it is, those records are still spread across the disk, mostly non-consecutively, which means it’s a very I/O intensive query (it’s not very cache-friendly).

That’s where Timescale (or similar) can add efficiency, if used well, by storing time-sequenced data together in chunks that are likely to be returned together in the same query. For example, instead of having to store a review session of 300 reviews across up to 300 disk sectors, it would get lumped together into an ‘array’ data type containing (for example) a day’s worth of the user’s reviews in a single record, which is stored consecutively on the disk. Each time a review is submitted, it would be appended to the day’s array record, rather than being stored separately. So, a year’s worth of data gets reduced from 100k disk accesses to 365 disk accesses, and the processor deconstructs the arrays into the exact results requested.

That also reduces the index size by several orders of magnitude, by the way.

(Side note: Heroku doesn’t support Timescale last I checked, but Timescale is primarily just a layer for constructing/deconstructing the array data types, so the necessary functions could be implemented manually at the backend application layer without a ton of effort. Of course, I say that as someone who mostly works on and enjoys low-level code, so I don’t mind getting into the weeds when it makes sense to.)

11 Likes

If you really care about your company and your product, you would make sure that your senior developers properly mentor the junior developers so they can potentially take over when the time comes.

4 Likes

Heyo, I’ve really been struggling to keep motivated now that I can’t watch my progress in the WK History Web app. Can we get an update on how fixing this is coming along?

4 Likes

I’m sure you don’t mean intentional harm by it, but using “master/slave” phrasing is more than kinda problematic lol. Can use primary/secondary or controller/worker instead

1 Like

@tofugu-scott Do you happen to have any update on progress for solving the API problem? Any hopium for us statistics addicts? Or is it abandoned and this change is now permanent? Thank you

6 Likes

Just cleared my browser data and lost ~9 months of review data stored in the heatmap script. I assumed it would be redownloadable over API but it appears that is not the case :frowning: . What about limiting the time frame returned by the API? Allow the API caller to ask for at most 31 days of data, would that help at all? Or maybe some sort of special batch procedure where I can visit a special page, press a button, and have my review data emailed to me in a zip file by a long running batch process overnight?

2 Likes

As you say, it seems like there should be plenty of potential solutions (even just severely rate-limiting the API for example) but I guess this is just very low down their priority list at the moment as they apparently have such limited dev-hours.

2 Likes

I seem to recall that Scott said it was on-the-minute requests that were the problem so it was a bursty problem.

Another potentially easier solution would be to respond with 429 Too Many Requests when the server is busy with too many API requests and use Retry-After to inform scripts how long they need to wait. That way, they would be to fine tune the database load from scripts themselves.

8 Likes

Replace “very low down their priority” with “not on their” and I agree with you.

They have created “Recent Mistakes” as summary page replacement and if we’re not happy with that, tough luck.

6 Likes

Just came back from a several month hiatus to find this and heatmap no longer working. My sub was still active, giving WK free money based on my slow learning and procrastination.

Throw a redis cache or someth in the middle if it’s such an issue ffs. Put old reviews in a separate database used just for this so it doesn’t matter if it’s slow. I can’t believe this has been an issue for months.

I’m unsubscribing at this point. I’ll use anki lol

edit: Oh, turns out I had a yearly sub. Well, y’all won’t be seeing my renewal come December :triumph:

edit2: It gets more ridiculous the more I think about it. All of WK’s users are paying $9/mo, and all WK needs to do is provide a simple api attached to a db. Compared to things with a ton of free users and much larger datasets, this should be trivial. Not an insurmountable technical difficulty.

13 Likes

As someone who initially mourned the loss of the session summary page, I think its removal was a good idea in retrospect. Under the new session, starting a review session is much faster, and you no longer have to worry about session timeouts at all. Nowadays, I routinely begin a review session and then finish it the following morning with no problems.

That being said, removing the reviews API was a huge mistake and I really wish they would bring it back. Additionally, it’s annoying that new features like “recent mistakes” are not exposed to the API at all.

Lastly, as a heavy user of both Wanikani and JPDB, I think that WK is much better. But of course lots of other people on the internet have wrong- I mean different, opinions. Heck, some people even swear by Anki!

5 Likes

I was an idiot and cleared my google cache in order to get back some space on my harddrive. I deleted 10GB worth of data, which I assume included my Heatmap data.

Very bummed about it. I hope that there is a solution because it was nice to be able to look at the progress I made in the Heatmap. I’m only 3 months in, so it’s 11 levels worth of reviews completed isn’t that much, but it’s going to bother me if I get to 60 and the data is inaccurate.

5 Likes

While many people might disagree, the fact that many of us continue to use WaniKani shows it’s at least doing something right.

3 Likes

232 days later and still nothing?? I really, really wish we would get an update on this. I have an API application that I really want to make that requires access to this data. I’ve been waiting for months and months now and still you guys won’t give us any information? Every single person in this thread pays you $9 a month, and you won’t even speak to them? I really think we deserve an update. You can’t just revoke peoples’ access to their own data with zero warning and then ignore them when they ask questions about it. I was okay giving you a few days, I was okay giving you a couple months even. But it’s been almost 8 months now. I’m tired of waiting and I’m tired of being ignored. We really deserve an update and it’s rude to ignore us this long.

We’re not just here complaining for the sake of complaining - we’re all here because every one of us lost something that we were paying for. We deserve to have that thing back.

“Technical difficulties” is a fine excuse for a few days. But after 8 months it’s just a lie. I would really appreciate an update. A lot. I’ve been asking for this for months now and I’ve been very consistent. Every week I’m reminded of how much I miss this feature and every week I wonder when it will come back. Please just update us. I am just going to keep asking until you guys do something.

16 Likes

Still no update?

2 Likes

Any estimation to when this will be fixed?

Seeing your heat map show that you did 80 reviews a day after you did close to 400 really lowers one’s motivation. Same with days when you “did not do any reviews”, but actually did hundreds on your mobile app.

I love Wani Kani and just bought a yearly subscription after starting in April and going maximum speed (one level per week), but this kind of thing really makes you wonder what you payed hundreds of dollars for…

4 Likes