After my first try to get a grasp on Yojijukugo, I went a bit further this time to look at some actual Yojijukugo, ha!
Instead of just looking at words made up of 4 kanji, which yields a lot of 4-kanji phrases that are not in the narrow and common definition of yojijukugo, I compared this popular list of yojijukugo with the Aozora corpus (an “online collection [that] encompasses several thousands of works of Japanese-language fiction and non-fiction.”)
The results:
- 一生懸命 appeared the second most often and is one of the few that also appeared in the previous post I made.
- The most common is 自分自身.
- These two are about 8% of all yojijukugo. They appear by far most often. Afterwards, the distribution is extremely flat.
- In the whole corpus, only 3072 of the listed 5802 appear.
- Koichi claims that he found countings up to 20k instances, but if even Aozora (as opposed to e.g. a news corpus) does not contain them, it tells you something about their rarety: ultra rare!
- A related work can be found in this post that mentions this list of top 50 instances. I find the idioms mentioned on that page only in the lower (non-zero) percentages of my fuill list. I think those are actually more idiom-y while many of the top ones found here are a bit less idiom-y. Lastly, here is a list for school children.
Finally, I made a nice plot:
And here are the top 100:
四字熟語 | count | %* | meaning |
自分自身 | 4258.0 | 4.55 | oneself |
一生懸命 | 3747.0 | 4.00 | very hard |
一人一人 | 1312.0 | 1.40 | one by one |
不可思議 | 1203.0 | 1.28 | mystery |
馬鹿野郎 | 1030.0 | 1.10 | godamn idiot |
神経衰弱 | 932.0 | 0.99 | nervous breakdown |
行方不明 | 923.0 | 0.98 | missing of a person |
彼方此方 | 771.0 | 0.82 | here and there |
御無沙汰 | 652.0 | 0.69 | not writing for a while |
一歩一歩 | 549.0 | 0.58 | step by step |
四方八方 | 528.0 | 0.56 | in all directions |
一所懸命 | 484.0 | 0.51 | very hard |
自由自在 | 469.0 | 0.50 | free |
無我夢中 | 462.0 | 0.49 | being absorbed in |
無茶苦茶 | 423.0 | 0.45 | nonsensical |
徹頭徹尾 | 411.0 | 0.43 | thoroughly |
右往左往 | 406.0 | 0.43 | moving about in confusion |
前後左右 | 402.0 | 0.42 | in all directions |
滅茶滅茶 | 394.0 | 0.42 | disorderly |
傍若無人 | 380.0 | 0.40 | acting w/o consideration for others |
武者修行 | 364.0 | 0.38 | |
中途半端 | 361.0 | 0.38 | |
相談相手 | 357.0 | 0.38 | |
半信半疑 | 357.0 | 0.38 | |
因果関係 | 350.0 | 0.37 | |
生存競争 | 340.0 | 0.36 | |
言語道断 | 322.0 | 0.34 | |
前代未聞 | 315.0 | 0.33 | |
二言三言 | 305.0 | 0.32 | |
実際問題 | 303.0 | 0.32 | |
無理矢理 | 293.0 | 0.31 | |
自業自得 | 292.0 | 0.31 | |
自暴自棄 | 292.0 | 0.31 | |
年中行事 | 286.0 | 0.30 | |
先祖代々 | 279.0 | 0.29 | |
生真面目 | 277.0 | 0.29 | |
面目次第 | 273.0 | 0.29 | |
大胆不敵 | 269.0 | 0.28 | |
一心不乱 | 267.0 | 0.28 | |
一部始終 | 258.0 | 0.27 | |
二度三度 | 257.0 | 0.27 | |
面白半分 | 256.0 | 0.27 | |
絶体絶命 | 254.0 | 0.27 | |
老若男女 | 243.0 | 0.25 | |
正真正銘 | 238.0 | 0.25 | |
荒唐無稽 | 235.0 | 0.25 | |
一日二日 | 232.0 | 0.24 | |
言文一致 | 230.0 | 0.24 | |
遮二無二 | 230.0 | 0.24 | |
一挙一動 | 218.0 | 0.23 | |
異口同音 | 213.0 | 0.22 | |
文明開化 | 213.0 | 0.22 | |
弥次喜多 | 209.0 | 0.22 | |
昨日今日 | 207.0 | 0.22 | |
愚図愚図 | 205.0 | 0.21 | |
東西南北 | 204.0 | 0.21 | |
一世一代 | 196.0 | 0.20 | |
公明正大 | 195.0 | 0.20 | |
一朝一夕 | 195.0 | 0.20 | |
一語一語 | 193.0 | 0.20 | |
一体全体 | 192.0 | 0.20 | |
半死半生 | 190.0 | 0.20 | |
挙国一致 | 187.0 | 0.19 | |
大真面目 | 184.0 | 0.19 | |
人事不省 | 183.0 | 0.19 | |
支離滅裂 | 180.0 | 0.19 | |
不得要領 | 180.0 | 0.19 | |
立身出世 | 180.0 | 0.19 | |
義理人情 | 179.0 | 0.19 | |
精神作用 | 179.0 | 0.19 | |
後生大事 | 178.0 | 0.19 | |
一目瞭然 | 175.0 | 0.18 | |
第一印象 | 172.0 | 0.18 | |
前後不覚 | 171.0 | 0.18 | |
種々様々 | 170.0 | 0.18 | |
一伍一什 | 170.0 | 0.18 | |
潜在意識 | 169.0 | 0.18 | |
神経過敏 | 168.0 | 0.17 | |
紳士淑女 | 167.0 | 0.17 | |
不真面目 | 167.0 | 0.17 | |
自然淘汰 | 166.0 | 0.17 | |
千差万別 | 163.0 | 0.17 | |
無二無三 | 163.0 | 0.17 | |
利害関係 | 163.0 | 0.17 | |
自問自答 | 162.0 | 0.17 | |
時々刻々 | 161.0 | 0.17 | |
尋常一様 | 161.0 | 0.17 | |
大同小異 | 161.0 | 0.17 | |
風俗習慣 | 158.0 | 0.16 | |
馬鹿正直 | 157.0 | 0.16 | |
四十九日 | 156.0 | 0.16 | |
黄金時代 | 156.0 | 0.16 | |
悪戦苦闘 | 155.0 | 0.16 | |
人身御供 | 154.0 | 0.16 | |
今日明日 | 154.0 | 0.16 | |
不承不承 | 154.0 | 0.16 | |
真一文字 | 151.0 | 0.16 | |
春夏秋冬 | 150.0 | 0.16 | |
勝手次第 | 149.0 | 0.15 | |
縦横無尽 | 148.0 | 0.15 |
* refers to percent of 四字熟語 found