Oops, I misread that as skip ahead and learn kanji that comes later.
I can “delete” a card (gone for good), but not “skip” one.
Regardless, having found I can skip ahead should help out!
Oops, I misread that as skip ahead and learn kanji that comes later.
I can “delete” a card (gone for good), but not “skip” one.
Regardless, having found I can skip ahead should help out!
It’s nice to have my writings split into groups now, so I can write about something programming-related (while being Japanese-study-related), and it’s easy for anyone not interested to skip over completely.
I utilize limited-time free whole volumes to generate manga frequency lists for my website Manga Kotoba.
I prefer to source these from Kobo for various reasons, but Kobo also seems to miss out on a lot of freebies available to BookWalker.
I previously wrote a script that downloads pages from BookWalker. The downside to this method is that BookWalker loads the image into an HTML canvas element. So, rather than downloading the original image, I get a screenshot of the canvas’s contents. This means I get a PNG image where the image’s size is based on the browser window size, and it might have padding on the sides:
I like to crop the whitespace before running the images through Mokuro, but the padding size is inconsistent from one manga to the next. That means I need to modify the crop setting in my automated cropping script for every series, sometimes across multiple volumes in one series.
This extra step adds friction to the process, so I rarely run any BookWalker downloads through Mokuro to generate frequency lists.
But at the same time, I’m constantly seeing freebies on BookWalker that are not on Kobo or are delayed before arriving on Kobo.
Once I add a series to Manga Kotoba, I source a genre list from AniList. And if AniList doesn’t have the manga? I look it up on シーモア.
And recently, I noticed that シーモア has the freebies that Kobo doesn’t.
I wondered: can I download the freebies from there?
シーモア’s images are transmitted scrambled, such as:
However, the reader reassembles this into three unscrambled images that form a single page.
The <div>s holding pages add/remove to a container div as one scrolls through the pages. An auto-downloader needs to properly navigate through the pages, downloading three images per page, without skipping any images.
It took three iterations of code before I came up with a solution that works reliably (bundled into a UserScript), and I’m satisfied with the result. This downloads all the pages as sliced images, such as:
From there, I pass the folder into a Python script (which I let Grok write for me, with no manual corrections on my part), and the result is combined images:
Then I can run that through Mokuro and Ichiran, and it’s ready for the site.
on first glance I thought, cool cover, I want to read that… wait…
Things got busy for me for a week, and I slowed kanji progress.
My current process I’m trying out is:
I’ve been working on overhauling my WaniKani Book Club Manager userscript.
In programming, you can store complex data in an “object variable”.
Separately, you can have a user interface with text boxes to display editable values.
When these two are joined together, where the interface shows the values from an object, and a user can view and modify the values in the object, this is known as “data binding”. The data from the object is bound to the user interface so the object can be read and modified from the user interface.
For the book club manager, I wrote my own code to handle binding. I didn’t know at the time that Javascript actually has built-in support for data binding. With that in mind, I decided to export my book club manager data and write up a whole new userscript that uses Javascript’s native binding.
The new version of the userscript is almost up to feature parity with the old script.
That is a good decrease in file size!
The new script isn’t ready to release yet, but I made a lot of progress Saturday.
Aside from adding several new series, I’ve been putting time into optimizing database queries.
One that still eludes me: accessing a series vocabulary frequency page is instant, but a volume vocabulary frequency page takes a few to several seconds to load.
Running queries for both directly in the database shows no speed difference. The volume query takes about 15 to 16 ms to run.
This suggests the problem is the code that calls the query and generates a page with it.
With that in mind, I spent some time doing code clean-up.
As part of that, I finally fixed 9 long-standing errors that I had a temporary workaround for. That resulted in more readable and better designed code, but it didn’t impact performance any. (That’s part of why I kept ignoring those errors. I knew fixing them and removing my workaround wouldn’t have any short-term impact.)
Having made no substantial progress in locating the source for the slowness, my bedtime has arrived.
I am left with no option but to accept defeat for the night.
The slowness issue must be challenged again another day.
Off to bed I go.
After massive code redesigning, the volume frequency list pages now load instantly!
Time for bed.
I’ve reached the point where my kanji time is:
Most of my Japanese time now is spent on SRS reviews. I may need to rework things a little bit. I’ll think about that maybe next week.
If I’m falling behind on kanji and reading, I should put more time into it at home. And that was the plan. Then I thought, “Adding new content to Manga Kotoba is done manually by running a series of scripts I’ve written in Ruby, Python, and Bash, running through Mokuro and Ichiran, and doing various steps for extracting and moving files. Why don’t I create one script to do everything?”
And just like that, I have no evenings and probably won’t have a weekend.
Over 1,000 lines of code later, I’ve made decent progress.
The new code runs on the same Node.js project as Manga Kotoba does (except it will be used with my local dev copy).
The current process is:
At this point, I get a basic interface that I will improve later:
This certainly gives me something to do each evening.
Due to various things, I fell quite a bit behind in kanji progress in the latter part of February. I’m aiming to reverse the trend in March.
While I’m nowhere near halfway through grade 2 kanji, I have worked my way through 52 out of 160, or 32.5%. Slow and steady? I’m getting a nice mix of:
The more steps I streamline, the more steps I think about streamlining.
From a screenshot perspective, this doesn’t look much different from my prior screenshot:
But there’s a lot more going on under the hood.
For example, I’m finally retiring the Ruby script that creates thumbnail- and preview-sized images of covers, as that’s built into the website’s preprocessing code now. (I took the easy way out and had Grok write that code for me. I’m still on the fence about signing up for a GitHub Copilot subscription, which I’ll be surprised if I don’t in the next few months.)
It’s so great to see your kanji knowledge solidifying and self-reinforcing now.
Huh, I didn’t know Grok was the coding AI name, but I sure recognized Microsoft’s AI TM “Copilot” with the GitHub… A friend says AI is the future and that job security lies in mastering the writing of AI prompts …
It’s always fun to read your blog and catch a glimpse into your very smart brain
Grok is the AI for X (Twitter). It’s not coding specific, but I find it gets me better results than ChatGPT for many things now.
The way I tell it to co-workers, and I’ve seen others online say the same, is to the effect of: “AI won’t replace people. People who use AI will replace people who don’t use AI.”
ChatGPT:
Grok:
Co-pilot I use from within a code editor, so I don’t have a history of topics. But here’s from where it had given me some code with !! in it, and I asked what that was as I hadn’t seen it before:
About a year ago, I needed to do something that it turns out Excel’s PowerQuery is designed for. I had heard of PowerQuery before but hadn’t used it or learned about it. But a conversation with my employer’s local version of ChatGPT mentioned it when I asked for options to do a certain thing.
I looked online for information on PowerQuery and how to use it, but what I read was confusing and not quite relevant to what I had. So I went back to the AI, asked for info, and found it to be understandable. I asked it to guide me through using PowerQuery in my specific situation, and it did so. It made it very quick and easy for me to pick up what PowerQueries are and how to use them.
And as it turns out, there were many areas where I could have used a PowerQuery. I introduced a co-worker today to using a PowerQuery to load a JSON file exported from a third-party website.
If I never used AI, I wouldn’t have learned about PowerQueries. If I didn’t use AI, I likely would have given up trying to figure out PowerQueries.
And if I wasn’t using AI (mostly ChatGPT) to help me improve my database queries, my website Manga Kotoba probably wouldn’t be faster than I believe should even be possible, yet is.
As for “the art of writing prompts”, there’s a lot of be said about properly wording one’s prompt. For text-based AI where you present a clear question with a (relatively) clear answer, I don’t think one needs to be too skilled at it. A little back and forth conversation with the AI can help push the user to getting the outcome they want.
But if you’re using AI to generate images, video, etc., then mastering prompts is (currently) a big deal.
I have a strong anti-AI bias but i don’t think it’s quite so much anti-AI per say as an anti-capitalism basis which AI (at the moment) currently enables, accelerates, pumps full of fuel etc.
If we could get around those issues and the fact that AI has no ability to just be like ‘hey i’m just making shit up here!’ so it gets around the “it’ll be right 95% of the time but if you’re not knowledgable enough in the field the other 5% is completely undetectable”, I’m sure I’d be less wary of it.
that said it seems like it’s actually useful for teaching coding, for some extent, because if coding doesn’t work you’ll know about it. Well, unless it introduces some bug that takes ages to trigger or weird edge cases or something I guess, but that’s also a key element of coding…
Thank you for the interesting discussion, you two!!
Time spent reading manga:
Time spent learning new kanji:
Time spent reviewing SRS cards for kanji and words using kanji:
It doesn’t help that I’ve also been working on various (unrelated) projects…
Warning: Everything below will read as nonsense to most people.
Speaking of projects, I’ve been meaning to get back to updating my book club manager.
I originally wrote it all in Javascript with my own data binding implementation.
Then I learned that Javascript has its own built-in support for binding, so I had AI take a sample JSON file and write all the book club manager HTML, CSS, and JS using the JSON file as a data template and Javascript’s built-in data binding.
It gave me something decent, but not quite there. I’ve been using it, but haven’t uploaded it.
I’ve been wanting to check out Svelte for a while. I planned to implement another project in Svelte, but as I started looking over the beginner tutorial, I immediately changed course to re-implementing the book club manager.
Thus went my afternoon:
The original book club manager took me months to write (slowly improving upon it over time).
The updated version took AI less than a minute, plus me prompting it for small changes.
The Svelte version, with combined effort between AI writing the basics and then me looking over it to learn and understand, followed by me making changes and additions, is going well enough.
I’m only posting this to pull myself away from the project long enough that I can go to bed.
Yes, but the manga have crucial context so I feel caught up now
Although I intended to complete grade two in two months, for various reasons (nothing bad) that hasn’t happened.
Grade 2 covers 160 kanji, which is a small number. In two and a half months I’ve gone through about half of that. That’s a fairly bad pace for still being on the “easy” kanji.
Various things have slowed progress, but the main thing is that I haven’t been intentionally keeping up a schedule of reading from the Chibi Maru-chan kanji book and creating cards.
It doesn’t help when I’m met with a minimum of 100 cards to review each day. (Today was 262 cards.)
The main reason for the volume is that I’m adding cards for each sample word in the Chibi Maru-chan kanji book. But over time, any card where I know the kanji reading and word meaning well, I’ll delete them to free up review time. If I don’t, reviews for the card should stretch out, but deleting just gets them out of the way entirely.
Another thing I’ve been doing to help reduce the SRS review is to delete leeches. These are often words that use a less common reading for a kanji. I figure if they are common enough, I’ll encounter them in manga and then I can intentionally try to learn them at that point.
I hope to push to make faster progress through the rest of grade 2, as grade 3 is where I’ll get to kanji that I need to give attention to and I expect to get the first real gains from reviewing.
I’ve made some adjustments to increase Manga Kotoba’s visibility in search engines, and in less than a day saw a massive boost in traffic from Google.
Unfortunately, it’s mostly from Japan with searches such as:
I don’t think they’re finding what they’re looking for…
Soon after my last post, I discovered a significant bug in Manga Kotoba that was masking my progress (in the wrong direction).
When a reading for a kanji is marked as known, the site is supposed to no longer show that reading on kanji frequency lists. The problem? It was hiding all readings for that kanji when only one reading was marked as known.
I thought I’d learned almost every reading for grades one and two kanji, but there’s a decent number I haven’t learned yet!
That means creating more flash cards to cover more readings to ensure I’m learning them.
Which leads me to…
Up until a few months ago, my daily Japanese schedule was broken down as:
These days, my daily Japanese schedule has become:
I’ve been bumping the manga-reading percentage up by reducing time on non-Japanese things (including development on Manga Kotoba).
All in all, I feel like I’m not making much progress on kanji-learning. There seems to be a clear split between “I know these readings” and “these readings instantly became leech cards”.
I’ll still push forward with what I’m currently doing for now, though. See how it goes a bit longer.
Search stats from Google are still not looking hopeful. Are there just not many people who want manga frequency lists? Or do people assume they don’t exist, so they don’t search for them? Or do I need to make some improvements in the keyword/description area?
It’s still mostly Japanese visitors looking for free manga. If there is anyone who doesn’t need a frequency list for manga, it’s (most) people in Japan. I mean, it’d be kind of cool if some people in Japan were using the frequency list to learn English words, but this would be a terrible format for it.
Even though these aren’t the visitors I’m looking for, I probably should put more effort into improving the mobile experience:
Although I’ve been working on how I can use Svelte to improve some aspects of the design of Manga Kotoba (especially for paginated data), I’m fairly proud of the progress I made earlier in the year to decrease page load times and improve general accessibility. There’s still a lot to do, of course.
Chromium-based browsers have something called Lighthouse that tests for various things. This helps highlight issues that can be improved upon. While getting a score of 100% is sort of like “teaching to the test”, and it doesn’t mean all issues are resolved, it still means many common issues are taken care of.
Lighthouse results for the site’s main browse page:
Eventually I’ll investigate the Javascript error that took my 100% in “Best Practices” away from me.
A few of my targets:
Managing properly-sized images is a bit difficult when I have a toggle to change the size of the manga series tiles (although that’s a bit broken at the moment and needs fixing), as that can’t change which image is used in the <img>
element. One option is to switch from using’ ’ to using a CSS background image, allowing CSS to switch to a different-sized image. This also means saving copies of the series cover images in different sizes, which I can add to my automation, but would also require running it on existing images. I haven’t gotten to it yet, but one day.
Reducing unused Javascript is one where I’m wondering if Svelte can help. With Svelte, CSS is defined alongside the components that use it, but I haven’t looked at where that ends up being loaded. Does it load only on pages that use that component? If so, then the same should be true for Javascript used on a component. And in that case, I’d be able to have only Javascript (and CSS) that appears on a page load onto a page. Something to worry about in the future.
The excessive DOM size is one that I don’t know if I’ll be able to improve upon unless I want to reduce the number of series tiles shown on the browse pages. If I do the conversion to using Svelte, I’d probably implement “infinite scrolling”, but using a technique that stores data in memory and creates/destroys entries on the page as one scrolls up/down. This allows for a smaller number of elements on the page, while still being able to seamlessly scroll through them. Maybe that can reduce the DOM size a bit.
I’m no expert, but I doubt Google counts the people who access it from here or it’s thread or who never actually closed the tab, as those people will be using the site without googling it.
I think to want manga frequency lists (and especially to type it into Google) you have to have been exposed to the idea that they are a thing which might exist. I wonder if some SEO (apologies for the swear) towards “learn Japanese with manga” and a page outlining how that might be accomplished might help?
Also having the romaji or English titles in addition to the Japanese titles might help with getting more traffic from outside Japan. (edit: you do, it’s just the specific manga page I checked doesn’t)
True.
I don’t have anything in place that lets me track actual usage on the site, so at the moment I can’t see how “active” direct site usage may be. Once I finally get e-mail validation implemented for new accounts going forward, then I remove all obvious bot accounts, it may be easier to get direct information.
What Google’s Search Console shows more than anything is discoverability.
That’s the tough one. Over the past five years, I’ve found maybe at most three or four people asking for “jpdb for manga”.
Although, now that Manga Kotoba isn’t the only site with manga frequency lists, maybe awareness of the concept will spread a little bit more?
I do plan/need to get more “FAQ”/“article” type pages up, which will provide both more information on utilizing a frequency list, and increase searchability.
Currently Manga Kotoba’s database design supports only two titles per series: the original Japanese title, and one English title. (Also, kana-only, although that’s only utilized for the useless “sort alphabetically” feature.)
This design does have a clear issue in that I tend to use the official English title. So a series such as 名探偵コナン lists “Case Closed” but not “Detective Conan”.
Eventually I plan to pull in titles from AniList:
But I need to see if there’s a way to pull in only English titles, unless I wanted to add non-English dictionaries to the site for anyone who searches for “Salapoliisi Conan”.
Something occurred to me. Aside from the split probably being high frequency vs low frequency readings, I wonder if the difficulty in acquiring the ones that are leeching is partially caused by learning these readings for the same kanji all together. There is research showing it takes twice as much or more effort to learn related lexical sets together. In other words, it’s going to be a lot easier to learn 10 words spanning 10 kanji than 10 words all including the same kanji.
That led to other thoughts on how to rationally combine a systematic kanji learning strategy with your wealth of frequency data.
I’m curious what your thoughts are on that. What I mean is (for example but not very elegant) rather than learning all readings/words for a kanji, setting a frequency threshold for words / readings to learn for a kanji before moving on. And cycling back for another round learning lower frequency words as you go and so on.
Something like that to still take advantage of learning the kanji in a systematised way, but incorporate the reality that there are some obscure words using grade 1 kanji that are way less important and relevant than a lot of words in the middle school grades. So finding a frequency based way to rationalise moving forward to new kanji.
I agree that a lot of it isn’t knowing this possibility exists and then even if they do, knowing what to do with it.
My understanding is if you’re looking for people to discover you, you’ll need a stream of fresh content that is attracting them to learn how to get into what you’re offering. I can imagine Google’s crawlers see a lot more commonality in your site as a way to access manga than as an educational resource. And the reality of SEO is it prioritises new content. Unless you are going for an ads strategy, I think the days are long gone that anyone can do a Google search and find an assortment of good free static sites. in my experience googling they show up less and less, and getting into this with work the marketing folks said a lot of complicated things that boiled down to - you have to keep pumping out fresh content or your views shut off. And anyway, if someone is googling how to learn Japanese kanji or vocabulary, those keywords aren’t prominent enough across your site.
It might sound tedious to write it all, but I don’t think it’s necessarily obvious how to use manga kotoba and how to incorporate into various Japanese learning routes, and that could be a great series of articles that people could easily find and share.
another option for you could be to partner with someone. Wouldn’t it be amazing if there was a link on Natively to Manga Kotoba for supported manga?!
Finally, I put links to manga kotoba on the Frieren club pages and perhaps if there was a standard and more helpful way to do that more book clubs would add it to the resources list. Pointing to that manga on manga kotoba as well as an article on getting started.
I probably should give this a try for grade three.
Looking at where I am currently, I have two cards with 雨 (grade 1) read as ウ, and both are what I’d consider leeches. (Migaku doesn’t have any way that I know of to view whether a card is a leech.)
When viewing Manga Kotoba’s list of the top 200 frequency kanji readings based on volumes I’m reading, 雨 read as ウ doesn’t make it onto the list. (But it is only the top 200.)
If I look at the words with 雨 read as ウ across everything in Manga Kotoba, and compare that with what I’m reading, I see:
Word | Site Frequency | Notes |
---|---|---|
雨後 | 378 | Zero appearances in manga I’m reading or have read. |
多雨 | 107 | Zero appearances. |
春雨 | 106 | Zero appearances. |
風雨 | 88 | Zero appearances. |
雷雨 | 36 | One appearance. |
So why am I trying to learn this reading?
Even the Chibi Maruko strip that uses this kanji only uses this reading once:
I shall be deleting my “雨 read as ウ” cards upon sight.
I don’t regret having tried to learn them, but from now on, I’ll take the sample words from the Chibi Maruko kanji books and only create cards if they show up in the kind of manga I read, as per my stats from Manga Kotoba. (And otherwise create cards from my high frequency words.)
It’s been on my to-do list for many months now to do something like this. But a lot of things I don’t want to get to until I get e-mail verification/password reset implemented, and that’s had various stumbling blocks, so all these other things that could make the site more attractive to use has been neglected.
This is where my introverted nature works against me. The book club section of the WaniKani forums are the majority of my experience in the Japanese language learning community!
(Fun stat: 42.25% of the manga series on Manga Kotoba link to their associated Natively pages.)
Any thoughts and suggestions on how to make such uses easy and user-friendly are most welcome. Currently, the height of WaniKani book club integration is:
That makes a lot of sense, you have quite a queue of tasks!
That’s a cool feature I keep forgetting about - maybe we should ask the various book club runner people about linking to those lists so they are discoverable from the book clubs! (As in the main ABBC BBC IMC page)
I’d also be so bold as to suggest making a little guide that could be linked to from the stickied post helping people lead book clubs. And that post could explain:
I wouldn’t mind helping with that and making requests on the book clubs as and when you feel like it makes sense to expand visibility. It’s an incredible resource you’ve made and you’ve been so modest about promoting it.