Ghost Kanji

Sezme · July 30, 2018, 10:36pm

I just read and thought I’d share a fascinating article about kanji that we don’t have to learn because they don’t exist in the real world. They only exist in the Japanese computer encoding system (JIS) because of … mistakes. In any case, it’s a fascinating short article.

An interesting aside in the article is that there’s a reference work that lists all the administrative districts within Japan that’s seven volumes and roughly 9000 pages each, and buried within them are some pretty unusual kanji used only for place names. Combing through that book and no doubt going half blind in the process, researchers added everything they could find into JIS and inadvertently introduced a few mistakes as well.

Hey, if I didn’t know better, I’d think these were regular kanji waiting for me on future levels of WK.

plantron · July 31, 2018, 12:03pm

Nice find. I wonder why these people made so many mistakes when developing the standard (that would have long-term repercussions in many areas). I understand that there are thousands of kanji, and that can be overwhelming. But it sounds to me like they were working under time pressure. A newspaper/magazine article has to be proofread by an editor/proofreader before published. These kanji should’ve been proofread by at least several people before they were included in the standard.

Belthazar · July 31, 2018, 12:26pm

I sometimes think noone did any proofreading for any standard in Japanese. Case in point, 恒河沙 means both 10⁵² and 10⁵⁶, because when they wrote the first Japanese mathematics texbook back in the 1600s, the author got it wrong, and noone noticed for about four years. Same goes for the larger powers of 10,000.

(Though I guess English is in no place to quibble - depending on who you ask, a billion is either a thousand million or a million million, though fortunately that later form is falling out of use.)

Rowena · July 31, 2018, 12:40pm

When I went to school in the UK (up to year 6), it was a million, then I moved to Canada and suddenly it was a thousand - between that and the different spellings/pronunciations I was one confused kid for a while!

plantron · July 31, 2018, 12:45pm

I suggest we use the SI prefixes to avoid all confusions. So one million dollars is 1 megadollar and one thousand million dollars is 1 gigadollar.

Sezme · July 31, 2018, 1:56pm

My guess is that it was 1978 and very few people understood computers, or what they’d become. Certainly almost nobody envisioned anything like the World Wide Web, and the whole exercise probably seemed more theoretical than practical.

On the positive side, it’s kind of nice to know that these kanji are ready and waiting on computers around the world in case anyone is able to assign universally agreed-upon meanings to them.

plantron · August 1, 2018, 12:12pm

That’s a good point. It’s easy to forget that almost nobody had a personal computer in 1978.

Belthazar · August 1, 2018, 12:20pm

I’ll say. I think there is a world market for maybe five computers. There is no reason anyone would want a computer in their home.

plantron · August 1, 2018, 12:29pm

To be precise, he was referring to computers in general (those giant, room-filling monstrosities).

Belthazar · August 1, 2018, 12:31pm

Indeed, but I felt my post would be a little less punchy if I stopped to explain that.

Sezme · August 1, 2018, 1:39pm

Around 1978, or maybe slightly before, I was in grade school, and a computer programmer came to our class to give a presentation about how you program computers using punch cards, and the end result was that you could solve complicated math equations. Sure seemed like a complete waste of time to me at that point.

The Apple II was released in 1977, and the TRS-80 in 1978.

But even in the late 90s, you couldn’t get Japanese encoding to work reliably across browsers on the web. Unicode encoding made things like WaniKani possible.

plantron · August 1, 2018, 1:47pm

That’s true. Also in 1978, most computer monitors could probably not even display intricate kanji with more than 10 strokes. At least games from that period only used kana or very simple kanji.

carolk · August 1, 2018, 1:58pm

One of the links at the bottom of the article is probably something that would be my worse kanji nightmare. A book fully written in made up Chinese characters…
https://en.m.wikipedia.org/wiki/File:Tianshu_title.jpg

hurts my brain

Sezme · August 1, 2018, 2:11pm

Actually, it looks like the JIS group was on point. Here’s a timeline of Word Processors in Japan, and it looks like they first started being sold in 1978.
http://museum.ipsj.or.jp/en/computer/word/index.html

According to what I could read there, before that time, Japanese typewriters and computers could only display katakana. We take it for granted now, but when they were inventing these word processors, not only did they need to worry about encoding and screen resolution, they also needed to devise an input method to get kanji onto the screen by typing on a keyboard. The OKI WORD EDITOR-200 for example needed you to type (I believe roman letters) on a keyboard to produce onyomi reading, and from that you could select the kanji you wanted. So for example, to type 行きます, you’d have to type KOU [then select the correct kanji]-KI-MA-SU.

Word processors as a category were very popular in Japan into the 80s, much more so than in North America (I believe they also had some popularity in Europe). These were essentially computers that were limited to text document creation, storage, and printing.

ETA: In 1979, Sharp introduced one with a tablet input method because people were afraid of keyboards! I think it looks pretty cool.

anon38003452 · August 1, 2018, 2:18pm

…And, since computer manufacturers had to work with the limitations of contemporary technology (Imagine developing kanji input for the ZX Spectrum!), the type of katakana that was displayed was half-width ｶﾀｶﾅ, which is encoded and can sometimes be seen to this day.
In fact, one distinctive feature of 半カタカナ is that the dakuten do not combine with the character they modify, but are their own, separate marks: ﾀﾞｲｶﾞｸｾｲ. I think this has to do with the fact that the computers of the time did not really support combining marks

Did you know that Unicode’s Unihan database includes romanized Japanese readings of 漢字, much like the example you mentioned, and that you can probably look up readings in your character map application? Here’s an example for the kanji 王

By the way, Unihan lists an on'yomi reading for 妛, and other ghost kanji

Sky20 · August 1, 2018, 3:24pm

Interesting. I think it’s cool that our computers have these characters but we wouldn’t know it, and there isn’t really a meaning to them. I await the day for a 彁 mnemonic.

Sezme · August 1, 2018, 3:24pm

Oh, and speaking of input methods, check out this keyboard for Fujitsu’s thumb-shift method:

http://museum.ipsj.or.jp/computer/word/images/0006_01_l.jpg

Left and right thumbs pressed the keys at the bottom to affect which character was entered. I’m sure once you got used to it, it was efficient, but it looks confusing as hell.

Sky20 · August 1, 2018, 3:28pm

That’s a pretty cool little set up. Going to do some more research on it.

anon38003452 · August 1, 2018, 3:29pm

Thumb-shift keyboards still exist for Japanese. According to the English Wikipedia, they are used by people who need to quickly input lots of text, such as playwrights.

Fearchar · August 1, 2018, 6:37pm

Maybe they should have gone for a purpose-built typewriter, like this:
http://languagelog.ldc.upenn.edu/nll/?p=3092

The full-form character set now used in Chinese printing was originally reformed and simplified by the Qin dynasty in the 3rd century BCE to serve the needs of the empire, retaining mnemonics but removing duplications from different traditions. The more radically simplified version pushed in the late 20th century by China, which has partly been included in kanji, has removed many of the indications of meaning which help to jog the memory. The likes of 畫, which includes the brush used for writing, was replaced by 画 (which is included in Japanese kanji) and, since it is divorced from its context, is in its own way a “ghost” kanji. (The same could be said of all kana, I suppose.)

Topic		Replies	Views
Half as Interesting: Ghost Kanji Japanese Language	1	198	June 13, 2025
A missing kanji Kanji	14	1785	December 1, 2019
Ghost Kanji on WaniKani iPhone app - whut? Plzhalp API And Third-Party Apps	9	1035	January 28, 2018
My Experience with Remembering the Kanji Kanji	38	4756	August 2, 2020
HAHA, finally found 鬱 in the wild! Reading	5	620	August 26, 2020

Ghost Kanji

Related topics