[Userscript] Keisei 形声 Semantic-Phonetic Composition

Hmm I should probably make two categories, derived tone marks and similar looking tone marks.

Be careful of feature creep though. At a certain point, there can be too much information such that it gets overwhelming and hard to follow.

With that in mind, you might want to explain somewhere in UI (e.g. tooltip, help dialog) to explain what all the settings and options are.


Ok, so. First part tells me 巻 is a poor match with its tone mark because it’s reading is かん instead of けん.
Then second part tells me けん is one of the readings for that tone mark. Is it that 巻 itself doesn’t use けん but other kanji that use it as a component do?

I should really add the explanation for the quality levels somewhere :slight_smile:

天 means that all readings of a compound match
上 means that the main reading matches, but you can also read it in a way not covered by the tone mark, but rarely
中 means that the main reading is different (かん here), but at least one of the phonetic readings is also used as a reading for the kanji (けん), just not in first place
下 means that the phonetic readings are nowhere to be found.

Kanjipedia: けん is a non-jouyou reading:


An overview of all your meanings (including the quality levels, the colors, the bolds) would be great.

Also 闘 says it has an unknown/contested tone mark, but you have とう as the reading for the phonetic component 豆 and this kanji is also read とう. Thoughts?

We are on a similar level, we always find similar things :slight_smile:

I thought it was a composite tone mark, like 豆+寸. But the only information I found is that it looked like this: 鬭, and 斲 is either the tone mark or just adds meaning (jisho says “cut, chop, hack”, something you would do in a struggle/battle). Maybe 斲 is really somehow related to 豆, or 豆 was even chosen as a simplification because of its reading.

I use the 豆 inside to remember the reading myself, but for the DB I try to keep the tone marks on “top level” (only two parts per kanji), so the mark would have to be 豆+寸 (can’t find it on its own, and 厨 is not とう (can be ちょう, though)).

I’m curious, what’s the reason for that?

No authoritative reason, but I believe this is how it is done.

The tone marks are not solely chosen for their reading, you would chose the most simple way to represent the sound in that, and always use the same mark. After some time you would have the idea to change your writing system to be only “sound-based”. Instead, oftentimes several kanji with the same reading are used as tone marks. I think that the tone marks are still chosen for their meaning (if possible).

So the tone mark is the whole thing, even if a part inside the part shows the tone (as in 青 => 生 => せい). Also, in Japanese they sometimes changed the reading of a compound tone mark, along with compounds themselves, so the original tone mark doesn’t fit anymore …

Similar to the above question, 浸 and 侵 both read as しん but apparently not related?

They are related, 寝 as well, the problem is that the tone mark is not printable (𠬶), you need a font with lots of kanji like MingLiU to see it. Don’t know if I should include it …

Question - what does ‘tone marks’ actually mean? I can’t figure it out.

Phonetic component in certain kanji.

Should 統 be part of the 充 phonetic group as a non-match? Right now it’s not included at all.

It was part of non-match already, the problem is that when you look at a kanji that is only not something that information is not displayed at the moment. If you arrive from outhouse (充) you can see it.

However, I looked it up and several sources say 統 is a phonetic compound, I changed it to matching.

1 Like

If you mean the word tone mark itself, I took it from the translation of 声符, but is probably not the best word. I will change it to phonetic component or something.

Version 1.6.4 with a few minor modifications.

  • Reworded the info strings a bit.
  • Semantic components are now also shown, for example 魔=麻+鬼 or 透=秀+辵 (*)

(*) mainly a by-product of my attempt to make a “kanji matrix”:


The script says that 令 also has the phonetic component of りょう, but 領 is the only kanji you list with this reading. If only one of six kanji listed has that reading (and not even the phonetic component itself), how can it be considered a phonetic component?

The reading りょう is uncommon, but it is the Go-on and listed as “outside jouyou”, you can see it here:


Interestingly 領 also has out-of-list reading れい listed in wiktionary. What gets listed as reading varies a lot, even kanjidict (which list lots of strange readings generally) misses many things.

Also here http://dic.nicovideo.jp/a/令 under 「声符」 you can see more compositions not in WK that have りょう as an option.

Follow-up question then. Sometimes you include kanji not on WaniKani in the script. How do you decide which ones to include?

I started with a list of all Jouyou kanji including the revisions from 2010 (some not in WK), then added the kanji included in WK that were still missing. Recently I also added all kanji that are used as phonetic components, this includes very obscure kanji not really used today.

But as compounds only Jouyou+WK additions will show up.

1 Like