What to Know
Stanford University has an online archive of overseas Japanese newspapers called Hoji Shinbun. The archive has 1,130,286 pages in total, but only 97 of those pages have corrected transcriptions.The more pages are known to be correctly transcribed, the better the quality of the OCR gets for the remaining pages, so it would be great to get more of the transcriptions checked over.
There is a variety of different material for different levels of learners who might want to participate. Absolute beginners could start with the English sections or English papers (like H. C. & S. Breeze), with checking article title transcriptions, or with correcting kana mistakes (often に is mis-transcribed as !:, た as だ, etc.), intermediate learners could start with newspapers that have furigana like the Hawaii Times or Hawai Sutā, and those looking for a particular challenge could try their hand at transcribing the sometimes hand-written newspapers of the 1800s.
Personally, I have been finding it an enjoyable and meaningful way to expand my vocabulary and get some experience reading older texts these past couple of days.
Some of the newspapers like The Rafu Shimpo are still around, but there are also newspapers like Asahi which share names with currently-active newspapers yet are not the same paper.
Thread's Purpose
- Look at this cool image/kanji/info I found
- I can’t read this word, looking for help
- Discussing the archive (note: I’m not affiliated with it in any way so I can’t answer questions beyond what’s on the site)
- Sharing your stats (rank, lines corrected)
18-1900s Japanese vs Modern Japanese
The first thing you might notice looking over the archive’s issues is that the kanji look a bit… funny. 変 may look like 變 or 広い like 廣い. Those “funny” kanji are (often) 旧字体, Japan’s version of traditional characters. You get used to them with repeated exposure. 新字体, the kanji we use now, were made standard in 1946, but it took time for usage to catch up. This may have been doubly true of the diaspora with the Hawaii times still using 旧字体 in the 1980s. (This is a complete guess on my part, but it probably wasn’t the most convenient of things for a 1950s local newspaper to import large amounts of new type from across the world.) I would recommend using a reference such as this one. I’ve also been noting down some words that I couldn’t read for my own reference since this morning.
Some of my notes so far
應ぜず(おうせず)叮嚀(ていねい)
盡く(つく)
解釋(かいしゃく)
廣吿(こうこく)
言辭(げんじ)
欺僞(さぎ)
總て(すべて)
Other odd kanji you might come across are 略字 like 㐰(信、個) or somewhat obscure numerical kanji like 廿・ 卄 (20)
The kana usage might seem a little different, too. The furigana for 港 might be “かう” rather than “こう”, or 伝える might be written as 伝へる. This is called 歴史的仮名遣い. It’s not necessary to know for checking transcriptions, but it can be useful for reading comprehension. There is a list on this webpage.
You may also encounter some classical grammar. In place of where you’d expect 帰らない might be 帰らず or 痛ませる might be 痛ましむ. If you want to understand those a little bit better, I recommend the トライイット course on 古文 . There are also a good deal of textbooks aimed at high schoolers taking university examinations that might come in handy. Just keep in mind that, especially with older articles, the grammar might not be something you want to reference in your own writing or need to learn in order to read modern Japanese media.
Stats as of April 7th (@ me to update)
Current text correctors: 152
Known WK users: 1
Total lines corrected: 563,149
WK lines corrected: 737
Ranking positions of WK correctors: 30 [GearAid]