Manga Kotoba: Manga Frequency Lists and Stats

ChristopherFritz · February 25, 2024, 4:15pm

Programming

Part of me feels like such jobs would require a vast background in SQL.

Then I realize that most people who apply for jobs requiring SQL are going to take a college course so they can show a piece paper stating they know SQL, without actually utilizing it and getting to know it well.

So once you’ve worked with it on a personal project, you’re probably going to be more qualified at SQL than some competing applicants who would otherwise get these jobs!

Yes.

JPDB does this, so you can sort the frequency list in order of appearance:

The only reason I don’t is because the frequency lists I generate for are manga.

Mokuro has no way to get the word balloons in the correct order.

I have considered adding an option to sort frequency lists by page, but then I’d want to add an option to only show words with a frequency higher than n (“show me high frequency words in page order of appearance”).

For book club vocabulary sheets, I rapidly go through each page of the volume and assign the order of the dialogue in Mokuro’s output.

I end up with tab-delimited output like this, with a sequential word number for the whole volume, and the page it’s from:

最近さいきん [n,adj-no,adv] recently; lately; these days; nowadays; right now 1 4

気付くきづく [vi,v5k] to notice; to recognize; to recognise; to become aware of; to perceive; to realize; to realise 2 4

ことがあることがある [exp,v5r-i] (for something) to have occurred; to have done (something) 3 4

お茶おちゃ [n] tea (usu. green) 4 5

やっぱりやっぱり [adv] as expected; sure enough; just as one thought 6 5

どくだみどくだみ [n] chameleon plant (Houttuynia cordata) 7 5

かっぺかっぺ [n] hick; country bumpkin; yokel 10 5

おばあちゃんおばあちゃん [n] granny; grandma; gran 15 5

だよねだよね [exp] it is, isn’t it?; I know, right?; innit? 16 5

I can’t do this for all of my manga frequency lists because it would be a huge time waste for no benefit to do for a thousand volumes and counting, but it’s worth it for ABBC.

Particles, too. I feel a frequency list isn’t the place to learn particles off of (although I do leave some conjunctions in).

My admin status gives me buttons to easily add the JMDict ID and series IDs to the blocked words table, but I also have a button that adds a JMDict ID without a series ID to apply that block to all series (site-wide).

Then in PostgreSQL, every time I call a frequency list, I need to exclude all the blocked words. If I ever add a new page viewing frequency lists, if I forgot to filter them through the blocked words table, then blocked words get through and ruin the experience.

Or at least, that would be the case if views didn’t exist in PostgreSQL.

I created a view that queries the vocabulary lists and filters out the blocked works for me, so any time I want to pull data from a frequency list, rather than doing a query from the original table, I do a query on the view that has blocked items already excluded:

=> \d+ unblocked_volume_dictionary 

View definition:
 SELECT volumes.series_id,
    volumes.id AS volume_id,
    volume_dictionary.dictionary_id,
    dictionary.data,
    volume_dictionary.reading,
    volume_dictionary.page_number,
    volume_dictionary.line_number
   FROM volumes
     JOIN volume_dictionary ON volumes.id = volume_dictionary.volume_id
     LEFT JOIN blocked_words all_blocks ON volume_dictionary.dictionary_id = all_blocks.dictionary_id
     LEFT JOIN blocked_words series_blocks ON volume_dictionary.dictionary_id = series_blocks.dictionary_id AND volumes.series_id = series_blocks.series_id
     JOIN dictionary ON dictionary.id = volume_dictionary.dictionary_id
  WHERE all_blocks.dictionary_id IS NULL AND series_blocks.dictionary_id IS NULL;

After writing the SQL for a view, I like to run it by ChatGPT (I’m using 3.5) to ask if there are any obvious optimizations that can be done to the SQL. Sometime it gives nonsense, but I’d say at least nine times out of time it makes little changes that either improve the performance, or I cannot tell if there’s a change but it doesn’t break the result.

What’s most important is that if ChatGPT gives a supposed improvement that I’m aware of what was changed and I look into why. That way I slowly learn over time as I see the same things come up over and over so I can start writing better queries from the start.

If I could sync my manga reading progress in Mokuro to the site, such as via an API that likely no one other than me would ever use, and have the frequency lists adapt to showing me only for pages I haven’t read yet, I’d implement it right away. (Well, that’s an うそ, because I could literally implement that today. But I won’t, as it’s not a priority.)

I’ve thought about this as well, although I haven’t looked into it. (I am interested in at least trying out the non-HTTP request docker you mentioned, but I don’t want to risk the new docker install not working while also breaking the existing docker I’m using.)

I imagine the bottleneck is the design of the code for processing lines of text, and not available CPU. (I haven’t looked at the source code to confirm this.) In that case, I would very much be interested in running multiple volumes through at the same time if I had a way.

But since I normally let it run overnight when I’m running many volumes through, it’s not that big of an issue for me.

You know how up above I shows a screenshot of my block buttons?

For my old site, I blocked words by putting them in a JSON file to have them be excluded when I generated a frequency list from the cached Ichiran (and before that Juman++) output.

This meant if I found a frequency list item that was a single kanji misparsed from a character name, I’d add the word to the JSON file, then I’d rerun the process to convert the cached Ichiran/Juman++ output into Markdown pages, then rerun Jekyll to convert the Markdown pages into static HTML pages, then re-run rsync to copy all the updated pages to the web server.

But for the new site, I just click a button.

Well, I still need to manually generate/run a few queries to update the site/volume word count caches.

Eventually, I’ll automate that as well.

All my posts about first-time reading of native material being about deciphering and learning as you go certainly apply to learning new skills via using them in projects. I’ve spent hours upon hours reading documentation and watching YouTube videos that show using AdonisJS to build a site from scratch. The latter is like joining ABBC where you’re being hand-held each step of the way, so you have guided learning.

Phryne · February 26, 2024, 2:36pm

More programming

I always assume that any requirement in a job ad is negotiable Where I live there’s a huge shortage of developers, so while they all say they want a senior-level programming rockstar, at the end of the day beggars can’t be choosers

I had a look at some online courses on Edx and I wasn’t too impressed… I agree that working on projects and ploughing through docs will likely teach you a lot more than sitting through a course. But I guess it’s harder to prove that you really know your stuff (especially if you’re dealing with non-technical recruiters who only look for relevant degrees and work experience on your CV before binning it).

Ahhhh of course! It’d still work for novels, but as the intended audience for these sequential word lists is the ABBC and they tend to stick with manga… oh well! I guess doing some manual work is the best solution in that case. I bet the ABBC folks really appreciate it

This sounds pretty awesome. I’ll have to read up about this! It reminds me a bit of Notion, where you can display loads of different versions of the same table. I manage my non-Japanese reading in Notion (cause I really hated the Goodreads redesign lol) and it’s pretty cool how flexible it is. I can only imagine the upgrade that SQL would be I am thinking up new projects faster than I can realise the old ideas!

Brilliant! I have been using ChatGPT to write my shell scripts for me (cause every time I realise I need to learn it, I am in a hurry to just get it done and then I lose the urgency to learn again lol) and I’ve noticed it’s pretty good at explaining what it’s doing and why. Same for Japanese… it’s surprisingly good at picking sentence grammar apart!

Yeah, that’d be a stretch goal of mine as well. It would be nice to smush together the reading shelf, the Mokuro viewer and the frequency lists. Just a one-stop shop for everything to do with reading Japanese. But if you want to store the manga images centrally you’ll run into copyright issues as soon as it goes beyond ‘personal project for one person’

Yeah I think this might be the way. I got the processing speed of a 140k character novel down to 35 minutes (I am feeding it ~2500 char chunks at a time) and that’ll just have to do for now. There always tends to be a bit of time between me buying a book and me reading it anyway

I played around with this yesterday, but I found it more trouble than it’s worth… Running ichiran-cli through docker exec slows it down significantly, so I’d have to figure out a way to get the information into Docker and then run the ichiran-cli command from within the container. I had a look at volumes, but I couldn’t get it to work. I need to learn Docker a bit more comprehensively, cause yesterday I was just chucking things at it to see what would work lol

This is what I’ve been doing and I’m getting fed up with it! Thankfully it’s not too often anymore. I ran my whole known Anki deck through Ichiran to get the seq numbers so that dealt with a lot of them!

One weird thing that I’ve noticed is that Ichiran comes up with a bunch of JMDict IDs that I cannot find in the simplified JMDict JSON you shared. I found a Github discussion about this and I guess I could request the original seq from each conjugated form, but that seems kind of cumbersome. I’ll see if I can reverse engineer the original seq with this table… How did you deal with this?

And as for blocking the seq of names… that works well enough when it’s pre-existing words that also function as names, but there are a lot of names that Ichiran parses as two individual kanji (e.g. 尾田 is parsed as 尾【お】and 田【た】). I can remove those names from the strings that I feed to ichiran-cli, but if you don’t catch all of them ahead of time you end up with a JSON containing tonnes of random one-character ‘words’ that also end up quite high in the frequency ranking. So far I have been compiling a list of names that do not double as actual words (cause you never know when a story will heavily feature unbarked lumber (黒木【くろき】) lol), so hopefully this will become less of a problem over time. I guess in the shorter term the only solution is to exclude those particular one-character ‘words’ that ended up in the results for that particular work

ChristopherFritz · February 27, 2024, 1:42am

I noticed this, too, but hadn’t looked into it yet.

I filter out anything where the sequence number isn’t in the JMDict JSON file, so I may be leaving out some words that I shouldn’t be…

Just avoid reading series where characters are named after words, and you should be set.

Phryne · February 27, 2024, 7:57am

There’s a reference list in the Postgres database Ichiran does its lookups in! I uploaded it, hope it’s useful to you

ChristopherFritz · March 10, 2024, 9:56pm

I’ve added early volumes for random series, although I haven’t kept track of titles. The site’s up to 1,211 volumes spanning 341 series.

Aside from little tweaks here and there, the main new additions are:

Exporting User Data

From the dashboard, it’s possible to download user data, including known and tracking vocabulary lists, and information on series marked as reading or another status.

Currently only JSON format is supported.

Kanji Frequency Lists

I’ve added an initial concept for kanji frequency lists. These can be accessed from a series page:

This gives frequency for the kanji in the series, and the words it appears in within that series:

I’m still thinking about how I want to implement tracking.

The old site tracks reading and meaning separately, which I’ll likely do here, so it’s just a matter of working out the interface. That, and pulling kanji reading and meaning data into the database.

It’d be nice if I could split the kanji by reading for one frequency list, and meaning in another. But I’m not certain how viable that will be to extract kanji readings from words and group them by reading.

mitrac · March 11, 2024, 11:57am

wow! I was playing with last week and noticed a huge uptick. It’s getting to be quite comprehensive!

exciting new feature, I’ll have a play!

Are you looking for any feedback or detailed use reports? I’m really loving your creations, thanks for sharing

ChristopherFritz · March 11, 2024, 11:59pm

I just wish it didn’t take so long to run things through Mokuro (my NVIDIA driver isn’t working right, so the this process runs on CPU for now) and then Ichiran. I basically set things up before bed and before I head out to work and let me computer do its thing.

As soon as I develop the split mentioned in my study log (so kanji is broken out by reading used in the words in the manga), I have a ton of ideas of how to utilize it from here.

Edit: It was worth being up half an hour past my bedtime to get it this far.

A bit more work, and it’ll be ready to move to live.

Looking at my long to-do list, I’d say now is the perfect time for anything you may write, whether about things that work well, things you’ve seen issues with (there are plenty!), things you wish existed (maybe something I didn’t think of yet?), and so on.

mitrac · March 12, 2024, 11:46am

ouch!

Aaaah I bet, that is such a cool feature. I love the workup. This is totally game changing and makes a lot of sense! Could it group same kanji together as an option? 1. Global frequency of the kanji. 2. Frequency of reading. (So in your screenshot, that would pull up the second 事) 3. Within entry, as you’ve done, by frequency of words.

If it’s helpful! So far I’ve just been exploring your site periodically and just clicking around. I’ll properly stress test the vocab lists and more features for the first time with Frieren. And if I find it gives me an edge, I’d like to try it with my first no furigana manga after that. That being said…

These aren’t prioritised at all… number 3 would be the one I would find most useful. My overall comment is that this is amazing and I already like it as it is better than jpdb

feedback

Is the idea to not have the srs on the site anymore? It’s not a big deal, but it was cool to have it on kurifuri. I’m pretty lazy about import/ export, but it’s probably trivial to get it into Anki and I could try that.
After opening a series to view, it relegates it to the back of the list (sorted by vocab known? ). So if I don’t give it a category or I give it an uninterested mark, I’m unable to distinguish them unless I look in again. It would be nice to have a way to filter the browser list, ie, to just show “uncategorised”? The other categories, aside from uninterested can be found on the dashboard,.
Suggested vocab to learn by global frequency. Super cool feature. I’d like to be able to toggle between the current all volumes/ all saved series and “next unread volume in all saved series” although that would mean having to create functionality by volume to mark read status. The reason is, some series have loads of volumes imported like conan and the chef, and so it swamps the global vocab frequency if that makes sense.
What does “tracking” do? For me nothing so far?
My new account started with 10 pages of known vocab is that right? It looked like a bunch of katakana but I didn’t look through them all. And then after adding a bunch of known words the preset ones went away? I’m not sure what that was. Maybe a blip in the updates
When looking at the suggested words, I saw what looked like the same entry for ore (me) in there 5 times (same kanji, kana, meanings). I realise some entries are duplicates on purpose due to different use of kana/ kanji, but this felt like a strange blip as I couldn’t figure out why the program would think they were different entries.
The usage button is very cool!
It seems when new volumes are imported, you have to click into the series to get an updated known word percentage. Could that be updated automatically without clicking on it? Or, how about make a note to periodically do that as I only noticed that Frieren suddenly plunged for too a bunch of new volumes at once.

ChristopherFritz · March 12, 2024, 12:45pm

While trying to not have too many different pages, my initial thought it:

kanji + meanings page. Displays all the kanji grouped by kanji with meaning.
on-reading page. Displays only kanji for on-reading.
kun-reading page. Displays only kanji for kun-reading.

I’m undecided whether to separate the on-reading and kun-reading kanji onto separate pages or keep them in one page as in my screenshot.

I plan to also have these available for:

Single volume
Whole series
Series with certain statuses (reading, owned, wishlist)
Volumes with certain statuses (pending volume status implementation)

Ultimately, I want a way to be able to say, these are the volumes I am reading and will be reading, and I want to know the highest-frequency vocabulary and kanji details I don’t yet know so I can focus on those.

Ultimately yes.

Initially I’ll remove the frequency lists from there once Manga Kotoba has reached data parity, since the frequency lists are what bogs down the build process for the site.

The Javascript layer for the SRS can stay around for a while for existing cards being reviewed, but support for creating new cards goes away.

And I’ve avoided added SRS to the new site because I feel SRS can be handled better elsewhere, such as with Anki, Migaku, Kitsun and so on. I like the idea of SRS integrated right into the site, but unless there’s a Node.js SRS+FSRS implementation I can just drop in and occasionally run an update command for, I want to spend more time on improving the frequency lists rather than implementing subpar SRS.

This is on my to-do list, but I haven’t gotten started on it yet.

As you say, for logged-in users, the series list sorts by known total vocabulary words percentage.

I plan to add filtering out series marked as uninterested, as well as optionally filter out series you are reading, etc. from the browse page (where the dashboard would be the primary place to access those).

I’ve gone back and forth on whether I want the “browse” page to be renamed “discover” as a page to discover series to read based on difficulty, but that wouldn’t make sense until series one’s already reading/read can be filtered out.

This is a bit more complex than one might think, but it’s a high priority for me.

Consider this:

You set a series as reading. Then you have volume one as finished and volume two as currently reading.

You set the series to hiatus. Should volume two be set as hiatus? You mark volume two as read. Does that mean series should be back to reading status?

You have a series set as wishlist. You mark volume one as reading. Should the series status change to reading automatically?

Some of these have obvious-sounding answers, and maybe I should spend less time thinking and more time implementing.

Mostly, I’ve just been lazy on getting this one done because after implementing the vocabulary status, I have to implement how changing a series or vocabulary status impacts the other.

But it’s high on my “want to implement” list, because if there’s a volume that takes about different species of insects, and it results in a high frequency of insect vocabulary, then I read through that volume, and those insect names never come up again in the series, I want those words to fall off of my recommended vocabulary list as soon as that volume is marked as finished reading.

One day I may add a page that tells what everything is!

I implemented the tracking feature so when I create an SRS card in an external system, I know I created a card for this word. That way I know I’m in the process of learning the word.

At a glance, I can see I have SRS cards in Migaku for most of my highest frequency words.

I plan to add a feature to show/hide tracked words on the vocabulary lists. That way I won’t need to constantly scroll through all my tracked words to find one that’s not tracked and thus is a candidate to create an SRS card from in Migaku.

There was a bug until about a week ago where it showed you known words for all users. You couldn’t remove them from other users, but their (well, my) words were included on your known words list.

I noticed this about a week ago when I switched between my personal and testing accounts.

That’d be when I discovered and fixed the issue. My apologies for the confusion with that. I was quite confused myself when going between my accounts and seeing known words that shouldn’t be there!

(But it’d be an amazing feature if by signing up, you automatically knew 1,000 vocabulary.)

If you see this again and can provide a screenshot, I can look into it.

And I just noticed a possible bug there. My recommended words list shows 人類 as frequency 122, but the usage page shows it’s frequency 122 in the top series, with other usage amounts in other series. I’ll need to look into why the discrepancy.

Screenshots for me to remember by.

Also, I need to look into adding icons for each reading status so the usage list can also list whether a series is reading, owned, wishlist, etc.

I need to look into the ultimate source of the issue, but this is what’s happening:

When new data is loaded in, it marks the series updated time.
When a user navigates to one of certain pages (such as the series page, or seeing it on the browse page), after the page loads it checks to see if the series data has been updated.
If so, then in the background, the stats are recalculated for the user.
On the next page load, the updated stats show.

I’m thinking I may want to use server-side events to update the values without a page refresh. There are packages for this for AdonisJS, the NodeJS framework I’m using, but they are still in development (getting closer to release) so I’m waiting a bit before I add them to the site. (I have been testing with the development versions, but I’m completely new to server-side events and have a lot to learn.)

It used to be that the stats wouldn’t update until you marked as word as known. That was even worse! Every time I would add a series or volumes, I’d have to comb through vocabulary lists to find a word I know to mark as known to get all my stats to update.

Ask the mangaka for Frieren to end the series so there are no new volumes, and this problem goes away (Well, for one series.)

Here's a snapshot of my currently to-do list, exclduing any new to-dos I added while writing this reply.

To-Do List

Tracked Words

Have a page that shows all tracked words.
Have an option to hide tracked words from series/volume/recommended vocabulary lists.
Add “Created On” column to tracked words table.

Series

Hide series upon creation and then unhide later. Can hiding for non-admins be handled in the model?
Add “Updated On” date to series status changes.
Have option to hide series with statuses select from the “Browse” page.
- If a series has a status, show a div on the browse page asking whether to hide series with status from the browser page, stating that this option can be changed in the dashboard configuration screen. Once the user selects yes or not (or dismisses the div), do not show the message again. Use a boolean in the config to suppress this message. Only check for a status if this “show one time” config is not set.
Have a sort option on the series list, including “date added” to easily see the latest-added volumes.

Collections

Move Twilight Princess to a Twilight Princess series page.
Create a collection page for all “The Legend of Zelda” series.
A collection page may exclude showing the “new unique” stats.
There must be a collection vocabulary page.
Note on series page any collections that the series is part of.

Mark as Known

Auto-update total words progress bar based on frequency number.
Cannot update sentences progress bar.

Server-Side Events

When loading a series list or series page (volume list), run SSE and check for if stats are outdated. Run this once every X seconds for Y times, then stop SSE. If there is outdated stats in this time, server sends updated stats.
Also when switching tabs, check server (not SSE) for updates via Javascript.
In both cases above, store in Javascript the timestamp that the page was loaded or progress bars last updated and use this to see if stats were updated in the database.

Block Pages

Have an admin view that shows words per page, and pages can be marked as hidden.
Blocked pages are excluded from the series_dictionary and volume_dictionary views.
The page block table would have columns for: volume ID, page number. The page number is based on what is in the database.

Block Words

Have a link to an admin view from each series/volume page that shows blocked works for that series. This can be useful to unblock anything blocked by mistake.

Misc

Add pagination the “review imported known words list” page.
FIX “MARK KNOWN” NOT WORKING ON WORD IMPORT PAGE. Also, the wordlist should take in a variable on whether to show certain columns.
Create a table that based on CSS can replicate the “series list” and “volume list” view. Then have a checkbox to click to toggle between table view. Mobile devices get the not-table view regardless, so the toggle would be hidden on mobile.
Include hiragana title in series search.
Implement keyboard navigation for search results (arrows, tab, enter).
Implement clearing search when clicking away (text box loses focus).
Allow adding series to custom combined frequency lists.

mitrac · March 12, 2024, 2:54pm

Wow thanks for the super detailed response! I love the project and am so excited about the updates you have planned

However you do it will be massively helpful as long as its possible to see most common kanji and then in turn their most common words. The different readings are a pain for learners and programmers it seems lol. Somehow I can’t get my brain to care to much about on vs kun. If only it were so binary! There’s often multiple of one or none of the other, compounds with either, and exceptional readings.

Your master plan is just so great, love it

I see, that makes a lot of sense

Strong disagree about being lazy, lol, but wow, I see what you mean about one change trickling down to a lot of decisions

Ah cool now I get it

That’s pretty funny, yes please to signing up and already knowing a bunch of words

I’ll keep a look out!

That’s right, I forgot about that but I remember now noticing that and sorting through as well, like hmm… I think I didn’t get around to page 6 words…

Amazing, makes me want to be a programmer to be honest

ChristopherFritz · March 12, 2024, 11:02pm

And that brings up my other consideration:

This design dispenses with whether it’s “on” or “kun” altogether.

When I’m learning kanji, I don’t care whether it’s “on” or “kun”, I just want to recognize it in a word and be able to determine how it’s pronounced and get an idea of what meaning the word may have.

Pro: Only requires one page rather than two to display all kanji based on reading.

Con: Some people might prefer to know if it’s “on” or “kun”.

So, maybe add an option to hide/display the “on”/“kun” information, like how vocabulary definitions can be shown short or long.

mitrac · March 13, 2024, 9:59am

I like that mockup, and removing the split of kanji on two pages, which is a pretty big pro from the learning angle. I like the show/hide for on/kun, as it would be a much cleaner look for people not worried about that. And to be fair, your tool (I would think) is more likely to attract people interested in overall reading frequency more than the on/kun distinction. For someone much more interested in on/kun, then I can imagine they would use a different kanji learning strategy anyway.

When I think about it from the learning side, for reading a certain manga, the first consideration is - which kanji am I going to see the most? Regardless of the reading. Because if a kanji is moderately frequent with lots of readings, then I’d like to focus on that kanji more and learn the most common vocab with the most common readings before a kanji that might show up higher on the list for just one reading. Does that make sense? That’s what I meant by this (but I meant that I wanted to see it on one page, not across several:

My attempt to describe how that would sort on one page

What I imagine is a combination of your mockup above, sorting by frequency, but at a level above that, sorting the kanji according to it’s overall order

like you showed here:

So first it sorts by kanji globally, then as a sub order, for example under 人, there would be the entries in the style of your previous post, ordered by reading frequency, e.g., roughly in this order:
ひと
り
にん

It sounds complex, even without knowing what steps would have to go into programming it. Maybe I should also say then, if you have one page with the global kanji frequency, and another page with the reading frequency (previous mockup), then that’s great! My idea above is more of a nice-to-have.

Another question, how will you handle showing reading frequency and at what point is it “known”? For example, in your example with 手 (て), what if 3 out of 5 of those words are marked as known, which takes care of 10 occurrences? The total frequency is 15, but of the unique 2 unknown words, the unknown occurrences are now 5. But even if it shows it in the list at the lower frequency, it would still be interesting and useful to see the total occurrences (e.g., known occurrences greyed out maybe?). I like to draw connections between words with the same kanji reading, but I don’t always remember them myself at that moment when I see a new word with that reading.

ChristopherFritz · March 13, 2024, 11:11pm

As you can see, there’s a lot that can go into displaying kanji!

Things that make is difficult:

Sorting by reading (ungrouped kanji) versus by kanji then reading (grouped kanji).
Managing known status for meanings and readings within one page and still hiding known items.

There are ways to make it work, but it’s currently not a priority with as long as my to-do list is. (I haven’t even implemented marking meanings/readings as known…)

ChristopherFritz · March 16, 2024, 10:01pm

Initial support for marking kanji meanings and reading as known is now available.

There are no stats or progress bars yet, but the primary feature of discovering the highest frequency kanji to learn the meanings and readings for is there.

The layout of the kanji pages is still experimental, but the underlying database tables are set, so even if the pages change in their design later, one’s progress will remain.

Currently, links to the kanji pages are on the series page above the volume list:

And on each volume page above the vocabulary list:

For the meanings list, hovering the mouse over a row reveals the meaning:

For the readings list, hovering the mouse over a row reveals the reading:

I haven’t yet added in proper mobile support for the tables, but the basic functionality should be at least usable on mobile.

Jintor · March 17, 2024, 2:46am

this is an interesting use of the fuckers that seems like it could be borderline sensicle; instead of just eating the output as is, querying why it continues to come up over and over again.

mitrac · March 17, 2024, 7:36am

Great, will give it a spin

ChristopherFritz · March 19, 2024, 2:36am

…and for a little more visibility, any items marked as tracked now contribute to the progress bar:

With this updated progress bar, I can see that I’m at 54% of words in the series as known, but I have another 5% in what I have tracked (which for me means I’m learning them via SRS).

(Existing users will need to initially mark a word as known, or as tracked, or as untracked one time to prod the server into populating the new tracked words fields for user series/volume data.)

mitrac · March 19, 2024, 2:43am

Very nice, that is a great feature to capture the, “I’m on it!” sentiment!

mitrac · March 21, 2024, 11:47am

Ok, I’ve been playing with this more and really like what you’ve done so far. It’s already highly useful and I’m excited! I think I’ve seen a couple of features roll out while I’m using it as well. The tracking and marking known kanji or readings is so helpful, and with the almost certain upcoming IMC pick とんがり帽子のアトリエ without furigana, is going to be so helpful to pick out the top kanji, readings, and most common words I’ll see there, to study before the club starts.

So far I’ve been studying kanji roughly in line with global frequency and for words I already know, and that’s going fine, but I’ve never had fantastic motivation to keep it up. The two lists you have give me something tangible to reach for in terms of which kanji and readings are most important now and linked to media I know I’ll see them in. I’m really excited, thanks so much!

The only missing feature so far (that I’m sure is on your list), is a page showing what I’ve marked as known. Not a big deal, but once I accidentally hit known and thought, oops, that’s a goner! And a progress bar would be nice, but isn’t a must since there are obviously the progress bars for vocab, and that is what really matters at the end of the day.

Speaking of progress bars, currently the vocab progress can be seen for a series, but not by volume. I can’t tell if I should be concerned about my low stats on Frieren relative to other manga, or if that is just an artifact of it having the whole series on there and most of my saved series are just 1-2 volumes.

ChristopherFritz · March 21, 2024, 12:11pm

I’m hoping to find the same is true for me. I’m still at the stage where each day I’ll scroll through and mark various items that I already know, while focusing on learning new vocabulary. But I’m also starting to get to learning new vocabulary with kanji I don’t know, so I’ll want to figure out how I want to craft flash cards for kanji soon.

Yup, I did the same thing yesterday on hitting something by accident. (I also want to come up with some way to “undo” a click immediately after, but I’m not certain the best way.)

I should make a “your kanji readings/meanings” page my next thing to get added, like I have for words, so at the very least you can navigate there to undo. (It has been on the to-do list, but I’ve just not gotten to it yet.)

This one I want to implement as well. I just need to add columns to the user_series and user_volume database tables to cache that information, then code to recalculate the stats in the cache, then drop the stats into progress bars.

Do you mean there is no progress bar on the individual volume page?

I have plans for other ways to see volume-level progress for whole series and for volumes with statuses (once I get that implemented), but for now you should see progress bars on individual volume pages. If you don’t, then that’s something I’ll need to investigate.

That’ll certainly make a difference, so do let me know if the volume vocabulary page doesn’t show vocabulary progress bars.

For me, the volume vocabulary page shows my progress as 9% higher, and my tracked words as 4% higher, versus the series. If I actually learn all the tracked words I’m SRS’ing, that’s a 13% difference between my progress for volume one and the overall series-wide available volumes.

Somehow, my known words percent for this first volume is 7% higher (read: easier) than for Frieren’s first volume. Since I use Mokuro for reading, the lack of furigana won’t get in my way, so I’m interested in seeing how the two series (and their respective reading paces) compare for me.

I look forward to the day when I add filtering series by whether they have furigana, so I can easily compare my stats for series without furigana. (It’s on the to-do list…)

Topic		Replies	Views
(Back up) Floflo.moe - A WK-friendly website for reading Japanese Language	4248	118180	July 1, 2020
Weow! Koohi.cafe - A WK friendly SRS [300 vocabulary lists!] Japanese Language	758	32079	April 18, 2026
Tracking Known Vocabulary and Kanji in Manga Resources	7	1561	August 9, 2023
Manga Wordlist Wiki Resources	19	1304	December 3, 2021
👻 Autumn's High Tech Study Log ✨ Learning Japanese at the speed of light with tech 🫨 Study Logs (Public)	130	4071	September 3, 2025


最近	さいきん	[n,adj-no,adv]	recently; lately; these days; nowadays; right now	1	4
気付く	きづく	[vi,v5k]	to notice; to recognize; to recognise; to become aware of; to perceive; to realize; to realise	2	4
ことがある	ことがある	[exp,v5r-i]	(for something) to have occurred; to have done (something)	3	4
お茶	おちゃ	[n]	tea (usu. green)	4	5
やっぱり	やっぱり	[adv]	as expected; sure enough; just as one thought	6	5
どくだみ	どくだみ	[n]	chameleon plant (Houttuynia cordata)	7	5
かっぺ	かっぺ	[n]	hick; country bumpkin; yokel	10	5
おばあちゃん	おばあちゃん	[n]	granny; grandma; gran	15	5
だよね	だよね	[exp]	it is, isn’t it?; I know, right?; innit?	16	5