Yojijukugo (四字熟語) frequencies in top-15000 word media corpus

Hi there! I got interested in Yojijukugo (四字熟語) which are words or idioms consisting for 4 characters. Idiomatic ones like 海千山千 are pretty difficult to guestsimate what they mean, since usually there is a story involved (see details below):

Summary

“Ocean-thousand, mountain-thousand” means “a sly old fox” or someone who has had all sorts of experience in life so that they can handle, or wiggle out of, any difficult situations through cunning alone. This meaning derives from an old saying that a snake lives in the ocean for a thousand years and in the mountains for another thousand years before it turns into a dragon. Hence a sly, worldly-wise person is referred to as one who has spent “a thousand years in the ocean and another thousand in the mountains”.

Non-idiomatic ones ‘make more sense’ if I can say it like that. An example would be 日米関係, which are the Japan–US relations.

Reading a bit more, it turns out that there are thousands of such characters, also 5- and 6-kanji expressions that are super obscure. There are lists for children to see what they are ‘expected’ to know.

It’s hard to get a grip on to what extent they are more of a linguistic flourish that makes sense for advanced studies or very specific texts like politics or religion and to what extent they might occur in daily life.

To get a qualitative idea, I took this list, which is a random heap of 15k words from internet and media vocabulary (not exactly comparable to aozora, especially in terms of literariness but more on everyday words). And filtered the 4+ character expressions. As you can see, many of them are proper nouns of companies and universities, political and geographical expressions. But still, whereas “経済企画庁” might not be amazingly useful, knowing the characters for one of the most read newspapers, 毎日新聞, or a war of historical importance, 太平洋戦争, might be quite useful (in the ballpark of WK words).

I actually heard 一生懸命 in the wild, so there’s even that.

Below I attached a csv if anybody wants to include them into Anki or anything. The first column, rank, shows where in the 15k words it appears. At WK Level 60 I would say it would be good to know all the expressions that are up to rank 10000 and the others contingent on circumstances (if you fly out to Narita, make sure to know 成田空港).

rank occ expression translation
1425 52.46 株式会社 stock company
1921 37.63 国務大臣 Minister of State
3104 21.30 一生懸命 very hard
3396 19.04 朝日新聞 Asahi Shimbun (newspaper)
3458 18.67 都道府県 Prefecture
5575 9.89 毎日新聞 Mainichi (newspaper)
5653 9.72 読売新聞 Yomiuri Shimbun (newspaper)
5713 9.57 東京大学 University of Tokyo
5879 9.19 中途半端 halfway
6549 7.86 地方自治体 Local government
6670 7.65 靖国神社 Yasukuni Shrine
6964 7.16 日本銀行 Bank of Japan
7612 6.28 朝鮮半島 Korean Peninsula
8085 5.74 義務教育 Compulsory education
8245 5.59 産経新聞 Sankei Shimbun (newspaper)
8277 5.55 試行錯誤 Trial and error
8405 5.44 情報処理 Information processing
8450 5.39 共同通信 Kyodo News (news agency); joint communication
9139 4.79 最高裁判所 Supreme Court
9248 4.69 創価学会 Soka Gakkai (Buddhist movement)
10016 4.16 無理矢理 forcibly
10137 4.09 日本共産党 Japan Communist Party
10173 4.07 三位一体 Trinity
10387 3.95 年末年始 Year-end and New Year
10431 3.93 早稲田大学 Waseda University
10486 3.90 日経新聞 Nikkei Shimbun (newspaper)
10513 3.88 岩波書店 Iwanami Shoten (publishing house)
10639 3.80 京都大学 Kyoto University
10999 3.62 地方裁判所 District Court
11008 3.62 政務次官 Vice-Minister for Political Affairs
11171 3.54 大学院生 Graduate student
11196 3.53 歌舞伎町 Kabukicho (entertainment district in Shinjuku)
11212 3.53 道路公団 Japan Highway Public Corporation
11449 3.42 修学旅行 School trip
11923 3.22 二酸化炭素 Carbon dioxide
12026 3.18 朝日新聞社 Asahi Shimbun (newspaper)
12574 2.96 太平洋戦争 Pacific War
12613 2.94 第一人者 Leading figures
12723 2.90 経済学部 Department of Economics
12734 2.90 経済企画庁 Economic Planning Agency (until 2001)
12821 2.86 法務大臣 Minister of Justice
12852 2.86 会計検査院 Board of Audit
13231 2.73 明治維新 Meiji Restoration
13641 2.60 神戸大学 Kobe University
13815 2.55 農林水産省 Ministry of Agriculture, Forestry and Fishery (MAFF)
13863 2.53 生年月日 Date of Birth
13884 2.53 日本経済新聞 Nihon Keizai Shimbun (newspaper)
14385 2.39 自分勝手 Selfish
14438 2.38 小中学校 Elementary and junior high school
14484 2.37 慶應義塾 Keio University
14500 2.37 中国共産党 Chinese Communist Party
14631 2.33 成田空港 Narita Airport
14641 2.33 一目瞭然 obvious
14643 2.33 文藝春秋 Bungeishunju (publisher)
 rank, occ,  expression, translation
 1425, 52.46, 株式会社,    stock company
 1921, 37.63, 国務大臣,    Minister of State
 3104, 21.30, 一生懸命,    very hard
 3396, 19.04, 朝日新聞,    Asahi Shimbun (newspaper)
 3458, 18.67, 都道府県,    Prefecture
 5575,  9.89, 毎日新聞,    Mainichi (newspaper)
 5653,  9.72, 読売新聞,    Yomiuri Shimbun (newspaper)
 5713,  9.57, 東京大学,    University of Tokyo
 5879,  9.19, 中途半端,    halfway
 6549,  7.86, 地方自治体,  Local government
 6670,  7.65, 靖国神社,    Yasukuni Shrine
 6964,  7.16, 日本銀行,    Bank of Japan
 7612,  6.28, 朝鮮半島,    Korean Peninsula
 8085,  5.74, 義務教育,    Compulsory education
 8245,  5.59, 産経新聞,    Sankei Shimbun (newspaper)
 8277,  5.55, 試行錯誤,    Trial and error
 8405,  5.44, 情報処理,    Information processing
 8450,  5.39, 共同通信,    Kyodo News (news agency); joint communication
 9139,  4.79, 最高裁判所,  Supreme Court
 9248,  4.69, 創価学会,    Soka Gakkai (Buddhist movement)
10016,  4.16, 無理矢理,    forcibly
10137,  4.09, 日本共産党,  Japan Communist Party
10173,  4.07, 三位一体,    Trinity
10387,  3.95, 年末年始,    Year-end and New Year
10431,  3.93, 早稲田大学,  Waseda University
10486,  3.90, 日経新聞,    Nikkei Shimbun (newspaper)
10513,  3.88, 岩波書店,    Iwanami Shoten (publishing house)
10639,  3.80, 京都大学,    Kyoto University
10999,  3.62, 地方裁判所,  District Court
11008,  3.62, 政務次官,    Vice-Minister for Political Affairs
11171,  3.54, 大学院生,    Graduate student
11196,  3.53, 歌舞伎町,    Kabukicho (entertainment district in Shinjuku)
11212,  3.53, 道路公団,    Japan Highway Public Corporation
11449,  3.42, 修学旅行,    School trip
11923,  3.22, 二酸化炭素,  Carbon dioxide
12026,  3.18, 朝日新聞社,  Asahi Shimbun (newspaper)
12574,  2.96, 太平洋戦争,  Pacific War
12613,  2.94, 第一人者,    Leading figures
12723,  2.90, 経済学部,    Department of Economics
12734,  2.90, 経済企画庁,  Economic Planning Agency (until 2001)
12821,  2.86, 法務大臣,    Minister of Justice
12852,  2.86, 会計検査院,  Board of Audit
13231,  2.73, 明治維新,    Meiji Restoration
13641,  2.60, 神戸大学,    Kobe University
13815,  2.55, 農林水産省,  Ministry of Agriculture, Forestry and Fishery (MAFF)
13863,  2.53, 生年月日,    Date of Birth
13884,  2.53, 日本経済新聞,Nihon Keizai Shimbun (newspaper)
14385,  2.39, 自分勝手,    Selfish
14438,  2.38, 小中学校,    Elementary and junior high school
14484,  2.37, 慶應義塾,    Keio University
14500,  2.37, 中国共産党,  Chinese Communist Party
14631,  2.33, 成田空港,    Narita Airport
14641,  2.33, 一目瞭然,    obvious
14643,  2.33, 文藝春秋,    Bungeishunju (publisher)

If you have any corrections for the list or your own experience how and when they are useful and how to assess the children idiomatic compounds, let me know!

P.S.: I meant to add the reading to the list, but then had to leave. Using the 10ten browser surely helps greatly with this list!

9 Likes

While I guess you could argue that in the strictest sense of the term these are 四字熟語 in that they’re 熟語 which are comprised of 四字, the term pretty much only applies to the idiomatic ones.

10 Likes

Yeah, people are going to be a bit thrown off if you say you’re studying 四字熟語 and then come out with 株式会社.

If you want to specify any words composed of 4 kanji, then something like 漢字4字からできた熟語 or something would be broader.

6 Likes

It’s interesting that there are practically no true idiom-type 四字熟語 in the list. Perhaps this is partly a result of its origin – you can see the media bias clearly in the way all the newspaper names show up in it. But I think it does show that there’s no strong need to study 四字熟語 specifically differently than any other words you might encounter, because they won’t turn up in large numbers. Moreover, if you’re feeling snowed under with vocab to learn you can probably safely enough skip a 四字熟語 because it’s not likely to be a frequent word.

If you do want to find the real 四字熟語 you can cross-reference your list against JMdict (as used by wwwjdic, jisho.org, etc) – they are marked with a ‘yoji’ tag. For example 一所懸命 has that tag and 株式会社 does not.

Speaking of 一所懸命, yeah, that one really is very common, and not literary or written-only.

1 Like

Thanks for your feedback. I consulted a few more sources and cursorily speaking, I feel most make the distinction between idiomatic and non-idiomatic Yojijukugo. But of course I see all y’all’s caveat raising, since arguably the interesting ones are the idiomatic ones.

For my grasp, though, what I set out to do is to see how many of them (both idiomatic and non-idiomatic) actually appear in everyday Japanese. It’s easy to pull up a list on the internet with 3000 Yojijukugo, but it doesn’t really help me as a student. If they literally don’t appear save for religious or poetic texts, it’s an important thing to know (doesn’t make them irrelevant, obviously, just makes them more situational than I originally thought). I guess I just find frequency lists very helpful guiding my studies and there are none (few?) for Yojijukugo.

I apologize if it came across as just criticizing the topic. There’s value in seeing what’s out there.

The thing is, if you do that, I doubt such a list would have things like proper nouns that just happen to be 4 characters long. They’re all going to be comparatively rare idiomatic expressions, save for the handful that you don’t need to go out of your way to study because they show up a bunch, like 中途半端.

You really can just assume that any yojijukugo you’ll truly need to know will be common enough that you’ll just learn them from exposure.

3 Likes

The other important 四字熟語 is 焼肉定食 :slight_smile:

(Edit: I have just discovered that Japanese Wikipedia has a long article explaining the 弱肉強食 / 焼肉定食 joke in great detail…)

6 Likes

Makes sense, maybe it was a bit a misinformed idea.

1 Like

Yeah, the long lists of Yojijukugo are annoying when I’m not looking for that kind of depth. I posted this Kitsun deck of over 700 narrowed down from the prior listed below (idiom based). You may find the deck or the sources helpful.

Description

This deck contains 700+ of the most common yojijukugo words from (recommended deck filter order):

  1. 四字熟語データバンク人気四字熟語[TOP50]
    -50 of the most common
    -Tagged by both writing and speaking frequency ranking

  2. 小学生のうちに覚えたい四字熟語の一覧表:ドリル

  3. Included in HKES : 4 sources only

  4. Community Recommendation

Not intended to be a comprehensive list, just enough for the casual learner. There are thousands of possible words, feel free to recommend additions in the Den thread if common enough.

Den thread

4 Likes

not sure if this has been posted but i use this to check my stats and it also has this cool little section of 四字熟語

it was posted by @saraqael

4 Likes

I found this too in the meantime. It’s a great functionality. @saraqael is the best!

2 Likes

Thanks for linking WK History :smile:

Thank you! :smiling_face:

Well, I’ll try to elaborate how the Yojijukugo are chosen there for anyone who’s interested:
I also took the data from this website Yojijukugo (edrdg.org) which is itself a collection from multiple sources. On WK History you can look through all the Yojijukugo where you have seen all of the Kanji in it (and try to guess the reading because it’s hidden until you click it). But there isn’t any kind of frequency info, because I just wanted it to be a small little gimmick.

3 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.