Ordering vocab by frecuency


#1

Hi everyone,
I have a doubt regarding vocab I face when reading, watching shows, etc…

So far I’ve go with the premise of adding everything (within some limits though) unkown to my Anki deck, that if I have at least one sentence using such word in my shows (from which I keep a sentence bank)
that looks something like this:

So, if the word has no sentence example … no new card it’s added.

Thing is that in order to keep having examples of word, I have to add more susb2srs cards from different shows… which it gets tedious :sweat_smile:

Lately I’ve been watching more stuff and reading regularly, so I’m comming with new words, that are starting to pile up; so I would like to know if there’s a way to order those in a more automatic way.
Like having a excel or something like that and be able to quickly give me the order of those words according to frecuency (using some corpus for example)?? This way I could be always adding words that are more frecuent first, which I presume would be more useful at easing my next readings.

I guess I would like to find a balance between still doing SRS (with maximum revenues) and having time to read and watch stuff … :sweat_smile:


#2

I am aware of two large frequency lists: Balanced Corpus of Contemporary Written Japanese and Vocabulary Database for Reading Japanese.

Both lists are spreadsheets which you can open in Excel, look up a word and see the frequency. You can remove unneeded columns to make the spreadsheet open faster.

BCCWJ has two ratings: based on short word units and base on long word units. Let me show an example why it is useful. When you check the frequency rating of 弁護 (defense, advocacy), it has frequency 1,451 in the short unit list, but frequency 24,915 in the long unit list. This means that the word is very common, but it is almost never used on its own (because it is almost always a part of compound 弁護士 - lawyer - so it is more useful to learn that compound than the word 弁護)

VDJR doesn’t have the long unit list but has different frequency ratings for different purposes (for Reading Japanese, for General Learners, for International Students). I’m not sure which is more useful.

I myself use a modified version of HouHou SRS which displays frequency rating from various databases when I look up a word, so I can use this information to decide if I should add this word to reviews or not. But I’m not sure I can legally redistribute the frequency lists (even though they are free downloads), so I keep this modification to myself for now. It looks like this:

Here BS - frequency in short unit word list of BCCWJ; BL - in long unit word list of BCCWJ; VU - in VDJR rating for Reading Japanese; VG - in VDJR rating for General Learners). “Unusual” is HouHou’s own rating “based on 7905 books” (not sure which ones). Looking at the screen, I see that word 腹立つ is not even among top-20,000 words according to all four frequency list, so I probably won’t add it to SRS.

通り is, on the contrary, very close to the top in all frequency lists, so it is already in SRS (already burned, in fact):


#3

Thanks, actually after the はら立つ example I’m somewhat questioning how much useful the frequency list can be for the purposes I want :sweat_smile: … Today I found exactly that word in my graded readers as well, which is the most basic of material I’m using; so I can attest its quite common in the material I’m reading and watching (as the seen the number of example sentences from my shows).
In any case I will like to give it a shot, as it might be more accurate with other words in relation to my immersion.
I have available the list of 60,000 most common words published by Tokyo University. its an excel spreadsheet; I’m wondering how could I use that to sort the word list I get for instance from a Kindle device (where I can easily export a CSV file with my underlined words).
For shows its mostly the same, I can use yomichan over subtitles to create a card… So the whole creation I think I’ve gotten to a point where it’s rather simple… Problem its volume… And priorize what to review…

EDIT: can you think in a way to add that frecuency number to somwthing like Anki perhaps as you did with HouHou?? If I could add it as a field in Anki I think it would be exactly what I’m looking for :yum:

EDIT 2: Ok, the anwser was right there in the same tool I already use … yomichan :hugs::hugs: … there’s “Innocent Corpus”, which you can add to yomichan as to provide word frecuency reference… having a list of words and hovering over with Yomichan will suffice to know the most common ones.