Manga Kotoba: Manga Frequency Lists and Stats

Being able to filter works by author or publishers is really great, I really appreciate all the improvements you’ve been doing to the site!

I have a question however: I will use the series “Orange” that I’ve been reading and following on the site as an example. As I read each volume, I would study the vocab some days in advance by adding them to my JPDB srs and mark the words as “I know this” in Manga Kotoba. The first five volumes out of 7 have all words marked as “I know this”, however if I go back and look at the stats page for say Volume 1, it says that I know 100% of sentences but only 96% words, but no other words are shown anymore in the word list.

Now, I’m pretty sure that I accidentally marked some words as “I don’t know this” during my work marking cliclking spree, which removed them from the word list(I presume also from other works). Could those words account for the missing percentage? Where can I see these words I’ve marked? I vaguely remember an SRS option to study words in the website but I can’t find it. I only see an option to see the list of all Known words.

It’s a really minor thing, but I admit not seeing that nice, satisfacting “100%” at the end of a volume bugs me a bit ahah, especially if there are words I’ve accidentally marked wrong that are now not going to show up in all other series on the site.

2 Likes

A couple of things factor into this:

When a volume is added to the site, its word count stats are calculated and cached. This prevents needing to calculate them every time a page is viewed.

Due to the nature of performing OCR on manga, there will be misparsed words that make it onto the site. Sometimes I’ll catch a misparsing that clearly is something that would never show up in a manga, and I’ll mark it to be blocked (blacklisted) site-wide.

Blocking a word requires recalculating the stats to avoid the percentage being slightly off. However, recalculating the stats for 3,000+ volumes and 1,000+ series is a bit resource-intensive, so I haven’t run a sitewide recalculation in a while. (I need to find a way to improve the resource usage for this process.)

If I’m blocking a word for a specific series alone (such as if I notice a name or a misparsing of a name) or a misparsing that I’m blocking side-wide, I’ll often recalculate stats for just that one series I’m blocking it from.

On the settings page, you can check the box “Show Known Words” to show known words on vocabulary frequency lists.

The old site had this, but it was very basic. I chose not to carry it over to Manga Kotoba because other sites/software handle SRS better than I would implement.

2 Likes

I see, that’s good to know. So once I finish to study all the words in a series I can pretty much consider it complete anyway because the rest are probably those blacklistes words, perfect!

Just for clarity, what does marking a word as “I don’t know this” do now? Or is it just a remnant from the SRS feature?

It removes the word from your known words list, causing it to show again on frequency lists, and no longer contribute to known word/sentence percentages.

1 Like

Going to give this a spin today. Thanks so much for all of your work.

Is there any way for folks to contribute? For instance, I’m a web designer with an interest in development, and would love to help in any way possible.

2 Likes

Currently my only planned method of contribution (although not set up yet) will be to have a way to upload Mokuro output for me to add to the site.

Me being bad at the actual visual design side, and learning CSS from the past decade as I go, and learning Javascript as I go, there’s a lot of room for improvement.

Are there any specific areas that you see where you feel “I can improve upon this”? (Don’t hesitate to be completely honest!)

My current ongoing tasks

Some things I’m working on, or looking to focus on, some being long-term projects, are:

1) Update the HTML and CSS to be mobile-first in design, then support larger screen devices on top of that.

Since I don’t browse the web on a smartphone, small-screen support is mostly hacked in at the moment.

It wasn’t that long ago that I learned Chromium has a tool to view a website at the size of various mobile devices and at portrait and landscape orientations:
image

There will be a lot of work to get that improved. Suggestions and mock-ups are always welcome!

2) Update Javascript function/variable names to be consistent.

As I jump between writing code in VB.NET, Javascript, Typescript, Python, and Ruby on different projects, I’ve only ever solidified my naming format in VB.NET. For Manga Kotoba, this has resulted in mess:

image

3) Clean up the CSS organization.

As I try out different things, my CSS is all over the place. I’ve also moved some from the external CSS files to the HTML template as per Google’s recommendation (which helps avoid things jumping around on the screen when the CSS file first loads). I also have CSS that’s carried over from the old site where I had frequency lists before, but isn’t used by Manga Kotoba, that I need to locate and clear out.

4) Improve database queries so I can reuse more queries and have less copy and pasting.

(Not much to say here without getting technical.)

5) And of course I’m always adding more series to the site:

I have over 50 volumes I need to run through Mokuro and Ichiran.

I have 42 volumes that ran through Mokuro yesterday and are running through Ichiran today.

I have probably about 100 post-Ichiran volumes waiting to be added to the site.

6) I’m working on a “per page” method of viewing vocabulary. (It will replace the current setting to toggle between frequency and paged vocabulary lists.)

This one needs a bit of work to look decent before moving to live.

7) I’m trying to find a way to utilize kanji lists:

I don’t have any concrete plans for what that will actually look like, but I really like what Migaku’s Kanji GOD add-on for Anki has for learning status:

image

I’d love to have something similar for Manga Kotoba, but specifically giving percentages of known meanings and readings.

The difficult parts here are:

a) Kanji “meanings” are a poor thing to measure knowing. This is because oftentimes, the meaning is an arbitrary word intended for use in mnemonics, and that gives it a degree of separation from the kanji.

b) Measuring known readings fails when you consider some readings are more common than others. Even if I could identify more common and less common readings, for each user that will depend on the kind of material they read.

8) Website colors.

Manga Kotoba’s layout started out based on an old site of mine:

But Google has some recommendations, such as darkening the orange color for better contrast with the white text.

It’s easy enough for me to say, “If someone can’t see white on orange, they probably can’t see well enough that they are reading manga,” but I’m still waffling on whether to darken the orange a bit.

There’s also my inconsistent progress bar colors:

image

image

9) Various other things not ready to mention yet as they may not go anywhere.

3 Likes

I’ve moved to live a few changes and updates:

Color scheme update.

I went with the orange, green, beige, and white color scheme because I used it on a website over a decade ago, so it’s a bit nostalgic for me.

However, after trying out colors with more contrast, the white-on-orange text does feel more difficult on the eyes.

I decided to stay with a similar color, but darker, and have settled upon brown:

Before:

After:

Layout at different sizes.

I’ve been working on improving the layout at different screen sizes.

Widescreen:

Less-wide screen:

Since a normal table doesn’t fit on a small screen, the appearance changes at this small width:

Aside from working better on mobile devices, another goal was for content to be centered at larger screen sizes.

Old layout with off-center content:

New layout with centered content:

This does shift the location of the sidebar to be off-center:

Progress bars.

The appearance of process bars has changed a little.

Old style:
image

New style:
image

Changes include:

  • Text is now black for better contrast.
  • A border has been added around the progress bars.
  • Percent numbers are included for unknown portions of the progress bar.
  • Smaller numbers fit better, removing overlapping.
  • Segments for values below 1% are no longer displayed.
Series/volume navigation menu.

Navigation menus have been added for the currently-viewed series and volume.

Series:

Volume:

There’s still more room for improvement.

Per page vocabulary.

The setting for per-page vocabulary has been removed, and a link to view per-page vocabulary has been added to the navigation.

This version shows words across ten pages rather than 100 words.

I’m still working out improvements, but for now, it should work at least as well as the old version.

Search on Immersion Kit.

Word pages now include a “Search on Immersion Kit” link:
image

This opens Immersion Kit search results for the word in a new tab:

I may add more links in the future, such as Forvo.

Known issues.

The ☰ menu overlaps content at smaller wide resolutions rather than nudging the content to the side. I need to improve menu handling.

1 Like

nice, lots of visual improvements there!

It’s more mobile friendly now (especially vertical orientation), thanks for that. There’s more dead space in a horizontal orientation, but even then, overall your updates made a huge improvement on mobile, to fit the word table into the window better (I don’t really know the technical terms, but before the window was bigger than the table so on mobile it required some awkward zooming in and scrolling to work, that’s better now)

Now that I’m learning kanji a lot quicker I’m hoping in a month or two I’ll get back to the frequency based learning again.

3 Likes

That’s on my to-do list, although I don’t yet have many thoughts on how I’ll improve upon it.

It doesn't help that my attention gets distracted easily by trying out implementing other things.

image

1 Like

This is a very handy resource! Thank you.

One minor feature request – it’d be nice to have a “compact” option for the various list displays. On my desktop machine (Firefox/Linux), each row is about 3x the height of the text it contains, making a full page on a 32" display contain 18 word entries.

1 Like

That’s a great idea.

The next move to live (don’t yet know when it will occur) will include an initial implementation that compacts vocabulary lists (maybe not kanji lists just yet):

I’ll keep in mind this compact option as I work on cleaning/improving the site’s stylesheet.

4 Likes

no rush, the current batch of improvements is a great bonus and already improved the experience a lot.

1 Like

Minor Updates

Volume Reading Status

The reading status for volumes can now be updated directly from the series page.

image

This resolves my least favorite thing: going into each volume separately to set the status for a series I’m already in the middle of.

Back to Top

When scrolling down a frequency list, a “Back to Top” link now appears:

This resolves my least favorite thing: being halfway down a frequency list on a smartphone with no way to quickly and easily return to the top of the page.

(This hasn’t been added to the kanji pages yet, as they need a lot of clean-up.)

AniList Links

Almost all series pages now include a link to AniList’s page for the series.

4 Likes

Features added to Manga Kotoba have primarily been what I’d like to use myself, which then become available for anyone using the site.

But there is one feature that’s been exclusive to me: adding series/volumes to the site.

This has worked rather well for me. I always end up with frequency lists and known word tracking in any series I read.

And for everyone else? If you’re reading Naruto, it’s not on Manga Kotoba. If you’re reading Angelic Layer, you’re on your own past volume 2. Unless a user decides only to read what I read, this limits the site’s usability.

While there won’t be an option for users to directly add series/volumes, I’ve been working on something that gets Manga Kotoba one step closer:

image

Right now, it’s not linked to anywhere on the site, but logged-in users can access the contribute and submissions pages.

Caveats:

  1. Currently, only uploading Mokuro JSON files from a Kobo EPUB is supported. (EPUBs from other sites may also work, but I’ve only tested with Kobo.)
    • Yes, this is a fairly arbitrary limitation, but I’m starting what my scripts are built for.
  2. Providing the Mokuro JSON files isn’t enough to add a volume to the site. I need to extract the text, manually run it through Ichiran, and then load the results into the site’s database, which adds an unpredictable delay.

Disclaimer: If you’re considering contributing, make your best value judgment on copyright concerns. Copies of Mokuro files are stored on the server until I can run them through Ichiran, and then they are discarded.

Note: Only the text content from OPF and JSON files gets extracted and stored. Files with other extensions will not upload.

2 Likes

I have a small feature request. It would be nice for book clubs to have query params for the “vocab by page” section to set the start and stop pages to be shown. This would allow linking to specific week’s vocab.

3 Likes

Coincidentally, that’s a feature I’ve played around with.

I’ll work on it a bit today and see what I come up with. I may still want to limit the number of pages of vocabulary loaded/displayed on one web page, but I should be able to come up with a limit that is still more than enough for ABBC and BBC at least.

Sample URL for pages 7 to 22:

https://manga-kotoba.com/volume/カードキャプターさくら-1/page/7-22

It’s currently capped at showing a max of 30 pages at a time.

2 Likes