Anki Word Frequency Inserter: Learn most common words first

Saimin · September 7, 2021, 7:56pm

This website/script inserts word frequencies from the InnocentCorpus (5000+ novels) or BCCWJ into your Anki cards. That way, you can choose more common words to learn first. (Enables sorting!)

(NEW: version using BCCWJ corpus (Contemporary Written Japanese, relative frequency: 100 = 100th most common word) (~5.8MB download on first visit)

Yomichan shows you these frequencies for most words (that aren’t too rare):

This number just tells you how often the word occurs in a corpus of ~5000 books.
Anything over 10k is very common, below 100 is rather rare (私: ~900k, 新聞: ~30k, とろ火: 72).
(<100 doesn’t mean obscure though, the word can still be useful)

Yomichan can even export these frequencies with the Anki export feature (which is great, see plus sign). However, it puts HTML around it, which makes it hard to sort. Also, it can’t add frequencies to existing Anki cards.

Requirements

The Anki addon AnkiConnect needs to be installed (which should already be the case if you use Yomichan and its Anki export).
Anki needs to be running.
Your notes need to have a field FrequencyInnocent or similar (you can change that name in the “frequency field” option near the top).
If your notes don’t have that field yet, you can add it in Anki via Tools → Manage Note Types → Fields.
You should close the Anki Browse window while doing the changes. I think the worst that can happen is that the currently opened card will not be updated. I tested this with ~900 changes and the rest was fine.

Disclaimer

This script should be very safe, since it only updates the FrequencyInnocent field of notes, if it already exists. But please back up your Anki collection via File → Export beforehand. It’s a good idea anyways. Use the script at your own risk, i won’t be responsible for changes to your Anki decks. The code is public though, you can check it here.

Using this to sort/search your Anki cards by frequency in Browse

You can either use another addon like Advanced Browser to be able to sort by custom fields:

Or if you just want to search without sorting or addons, you can use a query like deck:Yomichan FrequencyInnocent:9___ (3 underscores) which will find all cards with frequency 9xxx. (Or for frequency >10k: _____* (5 underscores + * wildcard)

To learn the most frequent, what i do is select some cards → right click → Reschedule / Set Due Date → place in review queue (0/0).
There’s probably a smarter way, since this makes the first interval 3 days for me, so i have to mark it ‘Again’ on the first review.

This is also just nice information, even if you don’t want to always learn the most frequent words first. The frequency column is nice to have in the Browse window.

Further usage and technical information

See GitHub - sschmidTU/anki-frequency-inserter: Inserts Japanese word frequencies from the InnocentCorpus into your Anki notes/cards.

My other website: wtk-search

It has its own thread:

Enjoy of course, feel free to leave any feedback here.

jhol · September 7, 2021, 8:05pm

Yet another immensely useful tool. Can’t wait to try it out! I still use Multi-Radical Kanji Search daily (by the way, are you still actively adding kanji to it?)

Saimin · September 7, 2021, 8:06pm

Thank you, that’s great to hear
Yes, i’m still adding kanji to the kanji search, it’s at 3071 now, and in fact i was about to start adding a whole bunch more very soon (~60 from the Aozora frequency list, and more)
I actually haven’t encountered any kanji in the wild lately that weren’t already in my search, i’ll always keep adding those!

ekg · September 8, 2021, 6:31am

Oh, this looks very interesting and useful. I’m still an Anki novice, but I’ll give this one a go! ^>^ Thanks!

Saimin · September 9, 2021, 2:32am

I added an improvement where if no frequency was found, it checks the Furigana field of the card (if it exists) and extracts the dictionary entry and frequency from that.

This gave me 50 more frequency updates after my first batch of ~900 with just looking up the Front of the card. (often it’s just due to me messing with the Front field, but, you know)

This is version 1.1.2 now, note that Chrome likes to not update the javascript that comes with the website and uses an old cached version, so try a guest/incognito window or another browser. (also, restarting chrome seems to do it) - it seems like Firefox is more willing to do a hard refresh (Ctrl+F5).

By the way, you can change the field names in the console via ankiInserter.ankiFuriganaFieldName etc. Just be careful that when you change ankiInserter.ankiFrequencyFieldName, you also need to change ankiInserter.ankiSearchQuery. But the page will warn you anyways.

The next big step will be inserting frequencies for (individual) Kanji, there’s separate frequency data for kanji as opposed to vocab, also from InnocentCorpus. Since Kanji and Vocab cards are difficult to separate, i’ll probably have to make the user say what their kanji deck is named, so that it doesn’t accidentally use the vocab frequency.

Another idea: Insert WK kanji and vocab level into your Anki cards. I’ve put a lot of WK items into Anki because i burned them on WK and forgot them, for example, and i’d like to know how many. I already have both fields for my notes, it’s just a hassle to fill them out. It’s also interesting to know whether a kanji or vocab is on WK or not. If it’s not, I put 0 in the level field.
(btw jisho.org’s WK information is outdated, but still an indication i guess)

next comment in preparation:

v1.1.8: Implemented removing the HTML from the Front field to look up the frequency.
This didn’t find any new frequencies for my Anki decks, but it might for yours.

Kumirei · February 22, 2022, 7:57pm

Might be worth mentioning in the OP that you need to set the webCorsOriginList in Anki Connect as to not get it blocked by CORS

Request: Would it be possible to add a way to specify the source field? I don’t typically call any of my fields “Front”, which makes it a bit tricky to use this

Saimin · February 23, 2022, 2:01am

As far as I know, on the first connect, AnkiConnect asks you if you want to accept the connection, then adds the url to the webCorsOriginList automatically. Though the manual adding is covered on Github, which the top of the page is linking to.

about the source field: how does this work for you exactly? I didn’t think you could ever not have a “Front” main field, haha, you’re killing me here Did you just rename your “Front” field to something else? Or what’s the main field of your note that has just the plain word? You probably know it’s “Front” and “Back” by default because that’s what flashcards usually have.
I’ll try to put this in, will require a bit of refactoring and uglification to accomodate special use cases

Kumirei · February 23, 2022, 2:12am

Ah, I didn’t see that on the github page, and Anki Connect never asked me. That’s all good, then

Yeah, I rename my fields to something more semantic for its content. The front of the card is often composed of more than one field, and if you have several card types the info on the front will differ, so it doesn’t make much sense for a field to be called “Front” for me. If it’s too much work I can just rename my field to front temporarily as I add the frequency

Saimin · February 23, 2022, 2:14am

It shouldn’t be too much, and maybe others are using it like this. I assume this field whose name you want to specify has the plain word. Then it should work if I just take the field name from a class variable that’s “Front” by default (though there are several other places in the code that need adaptation), and you can change it via console at the very least.
Btw it’s 3am here, so I’ll only get to it tomorrow ^^

Kumirei · February 23, 2022, 2:20am

Now that I think about it, your tool would work on several note types at once, so it’s possible that different notes would have the Japanese word in different fields… Ideally you would get to choose the field for each note type (with a default), but that might be a bit much.

I’m in no hurry, btw, changing the field to front temporarily is no big deal since I’m only dealing with a single note type right now

Saimin · February 23, 2022, 2:22am

Yeah, that would be nice. I guess until then you’ll be able to do it separately for each source field name.

I’m a simple man, I make an Anki card, I put the japanese word in the first field, called “Front”

NicoleIsEnough · February 23, 2022, 7:15am

Here’s what my Anki vocab cards look like

Pic

BTW there is another plugin (which inserts furigana automatically, so it’s extremely useful) that requires its field to be labeled “Expression”, so there might indeed be others who don’t name that field “Front”

polv · February 23, 2022, 7:46am

I wonder if anyone use Yomichan or some Anki addons on Android?

Kumirei · February 23, 2022, 2:02pm

Expression is a decent semantic field name. If that was the required field name I wouldn’t mind switching to that permanently

Kumirei · February 23, 2022, 2:02pm

I use the Akebi dictionary app which lets me add cards to Anki through Anki Connect on Android

Saimin · February 23, 2022, 5:14pm

Haha, nice. My cards also have lots of fields. I just never thought about changing the default first field name from “Front”.

But I see that this is a pretty significant use case now, so I’ll adapt for it.

polv · February 23, 2022, 7:57pm

I don’t think it is AnkiConnect, but rather AnkiDroid API.

I am using Takoboto right now, at least, originally. Akebi doesn’t seem to come with vocabulary lists (but right now, I don’t use one, anyway). There is also Aedict3, which appear to be quite decent.

On trying just now,

Akebi can export pitch accent and link to the app itself, but doesn’t appear to export alternate dictionary forms (using different Kanji, like 毀れる).
Takoboto doesn’t export pitch accent, nor does it allow customization at all.
Aedict3 can choose which form to export, but doesn’t allow choosing Note Type, but it does allow editing before exporting. I can’t choose to export pitch either, although it does show in app.

Kumirei · April 8, 2022, 10:18pm

Would you consider adding a ranked frequency source? I think ranked frequency (IE “Nth most frequent word” instead of “appears N times”) is more helpful than absolute frequency. Also the Innocent Corpus is (supposedly) a bit dated. I have myself switched to the ranked frequency Yomichan dictionary based on the Balanced Corpus of Contemporary Written Japanese found in this collection

https://drive.google.com/drive/folders/1tTdLppnqMfVC5otPlX_cs4ixlIgjv_lH

Saimin · April 9, 2022, 2:40pm

Ranked frequency would be nice as an option. Though honestly I like absolute frequency, because first of all you can derive ranked from it (but not the other way round), and secondly with rare words it’s more helpful to me that it occured 5 times, instead of being rank 45000 (where 4 times might be rank 48000). But yeah, ranked definitely has advantages, e.g. not being dependent on corpus size, or to focus on the most common 10k or 20k words.

What’s the filename of that frequency dictionary? There are a lot of files in that folder.

By the way, I’m also still planning on implementing the custom source field option, I just got sick right when I wanted to do it and then got busy with other things. But hopefully soon.

Kumirei · April 9, 2022, 3:15pm

Sorry, that’s the BCCWJ dictionary

Topic		Replies	Views
Looking for a dataset of Japanese words by frequency Resources	5	1803	March 16, 2023
Most common words Japanese Language	19	1821	June 10, 2021
Learning the not Kanji words Japanese Language	10	1273	June 21, 2023
Searching for a good Frequency list Speaking	11	2696	August 9, 2021
Is it ok to skip less frequently used words? WaniKani	11	291	July 22, 2024