Manga Kotoba: Manga Frequency Lists and Stats

ChristopherFritz · May 30, 2024, 4:20am

I’m experimenting with a way to show the distribution of a word within a volume with the “Spread” column:

The X axis represents the length of the volume. Dots on the left of the grid signify word appearances earlier in the volume. Those in the center of the grid, the middle of the volume. And those on the right of the grid, the end of the volume.

The Y access conveys the frequency of the word in that section of the volume. The bottom line represents one use, the second line up two uses, and so on. Anything that appears six or more times sits at the very top of the chart (although this still needs some adjusting).

I don’t know how useful it is, but I figure I can try it out for a bit and see.

(Plus, the site’s up to 2,333 volumes across 771 series.)

mitrac · May 30, 2024, 1:18pm

Ooh I like that!! And I really like that it shows up properly on the “view by page” mode so if I’m looking at vocab chronologically I can tell which ones are worth learning

ChristopherFritz · June 16, 2024, 5:04am

I wondered today, what would one’s stats look like if they learned vocabulary entirely from WaniKani?

After learning all the words introduced in the first 10 levels of WaniKani, that's enough for 25% vocabulary coverage for some manga volumes.

From the ABBC series on Manga Kotoba, only 「ちいさな森のオオカミちゃん」 reaches 25% at this point.

Learning all the words introduced in the first 15 levels of WaniKani brings some volumes up to the mid 30%'s.

And for ABBC, some series begin to reach the 30% mark.

These numbers aren’t the best representation as WaniKani overall focuses on words with kanji, and there are many common words usually written without kanji that learners will learn quickly.

If we add in a decent handful of common non-kanji vocabulary, that changes the numbers quite a bit while still at 15 levels of WaniKani.

With this extra vocabulary, that gets one up to the 50% range.

Likewise for ABBC volumes.

But even at this point, all the vocabulary look-ups makes reading manga a mind-numbing experience for some.

And finally, getting through level 20 on top of the above gets some manga up to the 60% range.

(It only adds a few percent to ABBC reads, though.)

There’s still look-ups every other sentence or more.

Supplementing one’s vocabulary by learning the highest frequency vocabulary in a volume before starting reading should boost those numbers considerably.

mitrac · June 16, 2024, 10:47am

I really like these thought / data experiments you do

A couple months ago I counted that for a novel passage… I don’t remember but it was something between 1/3 and 1/2 of the words. Kanji related words were only 1/3 of all words so I’m not surprised that doesn’t appear to give a huge benefit right off the bat. But getting 25% coverage from the first 10 levels is actually pretty good if the max possible 35%. It would be interesting to know, which WK level achieves what coverage of kanji related words.

The most common kana words are so easy to learn, and the rest are grammar constructs (particles, conjunctions etc…), mostly from N5 to N3.

Otoko68 · June 22, 2024, 6:38am

So jumping to this thread as I will be using manga kotoba for my reading of the Dungeon Meshi series.

It’s really awesome to go through the list of words and be able to add the known ones with a click. It’s also super motivating to see the overlap between volumes - so learning for volume 1 seem to really help for volume 2 as well (hope fiture volumes will keep being added).

The bar showing how much of the manga is known is great but I was surprised to see the number be index on the word frequency count rather than the number of unique words. So the first batch of words I added brought me to 20%+ of vocab known, even though they might only represent 5% of the total number of unique words.
Any reason for the choice?

RebBlue · June 22, 2024, 8:22am

It’s easier to tell how often you’ll need to look things up. With 20% coverage you’ll be looking up 4 words every 5 and as you increase that coverage the amount of looking up will go down. You can’t really guestimate that based on a percentage of the unique vocabulary.

ChristopherFritz · June 22, 2024, 12:04pm

(RebBlue’s response covers it, but I like writing about these stats, so:)

Eventually I want to include both total and unique known word counts. I just need to determine the best way to fit everything into the interface.

With only one of the two shown currently, consider the following stats. For a manga volume, learning the top 100 highest frequency words may give you coverage such as:

Unique words known: 6%
Total words known: 40%

If instead you had opted to learn the 100 lowest frequency words, not the highest, you would be looking at:

Unique words known: 6%
Total words known: 2%

In both of these situations, the unique known words is 6% of the volume’s unique words, but one scenario gives you 40% of the total word coverage and the other gives you 2% of the total word coverage.

Here, the total word coverage gives insight into how readable material will be, whereas the unique word count tells no useful information for reading. (Other stats, such as the percent of sentences you can read without lookups, supplement this in either case.)

That’s for individual volumes.

When it comes to a series where multiple volumes are on Manga Kotoba, the stats become even more interesting.

Consider the first 30 volumes of Detective Conan.

Here, to learn 75% of the total words in volume one, you need to learn 56% of the unique words (848 words).

But for those 30 volumes collectively, knowing 75% of the total words requires learning the top 23% of unique words (which, across 30 volumes, is 3,758 words).

At the series level, the unique word counts become unreliable the more volumes of a series are added to the site.

Otoko68 · June 23, 2024, 9:16am

Makes sense, thanks for the detailed explanations.

I have now used it for a couple of days, and really like the scroll over the word list leadong to the reading and meaning to unmask. It is so fast and fluid to go through the words!

One request I may have: I can see the charts for each word which seem to show in which part they show in the volume. Would it be possible to display the words by (roughly) order of appearance?

This would really help when preparing to read a volume to focus on words in the incoming chapters.

ChristopherFritz · June 23, 2024, 1:13pm

Order of appearance is a bit more difficult because: What if you’re two chapters in, and you didn’t learn some words from earlier in the volume that are in the next chapter? Sorting by order of appearance means words that show up earlier and later in a volume show only on their first appearance on the list. (This isn’t an issue before you start reading, of course.)

I actually do plan to support sorting that way, but in the meantime you may be interested in this option from the settings page:

This splits the word list into words-per-page, and may be considered randomly ordered per page.

If a word appears on multiple pages in a volume, it will appear on each page’s table when listed this way.

There is a drawback that the table numbers shown here won’t necessarily line up with the page numbers in the manga volume, but they should be close, so if you can match up a page table on the site with a page in the volume, you should see correlation, such as the table number always being two higher than the volume’s page number (for example).

ChristopherFritz · June 23, 2024, 5:26pm

+1 and +n Sentence Stats

After working on this now and then for a while, I’ve finally added +1 and +n sentence information:

The display can use some work (this is one of my weak points), but the progress bar shows:

Green: Percent of sentences that can be read without vocabulary lookups.
Orange: Percent of sentences that can be read with only one vocabulary lookup.
Gray: Percent of sentences that can be read with two or more vocabulary lookups.
- This one doesn’t show the percentage unless you hover over it with a mouse.

This information displays for both series and volume pages.

Eventually I want to add a “FAQ”-like section, then I can include a small link beside the progress bars to a page that explains them in detail.

Names Disclaimer

A disclaimer now appears when names have not been excluded from a vocabulary list:

Removing names is a manual process where I check to see if any high-frequency words are either a character name or a misparsing of a character name. This typically means heading over to Manga Pedia to see if 1) they have a page for the series and 2) it includes a list of names for me to refer to.

Aside from Manga Pedia not having a page for a series, it’s also common for me to not check for names when adding volumes for tens of series in a weekend.

Otoko68 · June 23, 2024, 7:55pm

That definitely will suit my needs - thanks!

ChristopherFritz · June 30, 2024, 7:24pm

Furigana Indicator

A visual indicator now shows whether a series uses furigana or not:

Publisher and Label

Most series now list their publisher and label:

Follow the links to view other series released by the same publisher or under the same label:

Eventually I plan to add English title translations for publisher/label names.

Reidejong · July 1, 2024, 12:53am

thank you for your work, wanted to ask isn’t there a faster way to mark all words in a volume as known

Gorbit99 · July 1, 2024, 1:37am

I personally measure your posts in meters than words.

ChristopherFritz · July 1, 2024, 1:49am

At this time there isn’t.

Marking all words from a volume as known would come with risks, as there may be some words you don’t know among them.

What I did when first started out marking words as known was to focus on the highest frequency words in a volume (or series), and mark what felt to me like a reasonable amount any time I happened to be viewing a frequency list. (But I’m also viewing a lot of frequency lists as I add more volumes to the site.)

mitrac · July 1, 2024, 6:38am

Such a great idea, thanks!!

ChristopherFritz · July 6, 2024, 10:37pm

Manga Kotoba has reached nearly 3,000 volumes across over 1,000 series.

While most of these series will regrettably never catch anyone’s interest (just the nature of things), the more series added to the site, the greater chance of users finding a variety of series that are favorable to their individual known words.

This weeks experimental feature (which may stay or may go) is listing series that currently have (time-limited) free volumes on BookWalker.

If a series has one or more (time-limited) free volumes on BookWalker, this will show on a series card:

Assuming I actually keep this up to date (or get an automated process for it written up), an up-to-date list of all series on Manga Kotoba that have free trials can be seen here (URL subject to change):

https://manga-kotoba.com/bookwalker/freebies

(Note: This page doesn’t currently follow a user’s series-exclusion settings like the series browse page does.)

When viewing a series page, there is now a “Search BookWalker” link, making it somewhat easy to access the time-limited free volume(s).

Credit to @rodan for their BookWalker query in the BookWalker freebies thread. I’m using this to get a decent list of what’s a freebie.

mitrac · July 7, 2024, 7:04am

I like that, it’s nice to browse the freebies with the cover image plus frequency info. If you haven’t already, set up affiliate links to bookwalker so then you can get more manga for yourself or to feed your project. As far as I understand you don’t get cash you get bookwalker points. There are instructions on the freebies thread. You deserve the commissions for sure. Your project is so generous and helpful I’d go out of my way to click them before making a purchase.

When I went to that page I saw a manga I don’t think I’ve click on and instead of NaN it showed 101% but the circle is at zero. I couldn’t get a screenshot in here, sorry, I’ll see if I can replicate it on desktop later. The manga was Papa told me if that helps

ChristopherFritz · July 7, 2024, 2:56pm

I should look into that, but sourcing manga from BookWalker to extract data from is inconvenient. It’s probably worth it for the sales, though.

This is on my to-do list to fix. (I’ve just been lazy about it.)

The site stores a cache of how many words you know for a series, updating it only when your known words change. When I add volumes to an existing series, or split out two series, such as splitting out “Papa told me” and “Papa told me Cocohana ver.”, the cached known value is no longer in sync with the total value.

For example, if a series has 1,000 words and you know 750 of them, that’s 75%. If I split out one of the volumes to a separate series (because it’s actually part of a spin-off), now it might say you know 750 out of 600 words, putting the percentage at 125%.

It just need to figure out specifically which value(s) where I need to update when.

ChristopherFritz · August 12, 2024, 1:28am

The next clear addition has been to add mangaka information.

I took a while on this one because I was considering how to mark if the person was the sole mangaka or if it was a duo where one person wrote the scenario/story and the other did art. Sometimes, you also have a third person who did character design, and so on.

In the end, I decided to take the Natively approach and use the names alone, without identifying their roles:

Following a link on a name brings up all the series they’ve worked on that are available on Manga Kotoba:

Note: Currently, user stats are not displayed on this page.

Out of 1,172 series on the site, I have added mangaka information for 338 series (mainly focusing on series that users are reading, own, or want to read). I’ll get mangaka information added to the other 834 series over time.

Topic		Replies	Views
Tracking Known Vocabulary and Kanji in Manga Resources	7	1147	August 9, 2023
Manga Wordlist Wiki Resources	19	1123	December 3, 2021
Inuyasha 犬夜叉 Vocabulary List Reading	2	199	January 8, 2024
Vocabulary resource I'm developing for all levels of learners Resources	4	177	November 7, 2024
Getting started with jpbd? Resources	33	6809	September 18, 2023

Manga Kotoba: Manga Frequency Lists and Stats

+1 and +n Sentence Stats

Names Disclaimer

Furigana Indicator

Publisher and Label

Related topics