I was able to grab another ~300 pitch accents from OJAD out of the ~600 missing if you’d like me to PR them on github. Not sure if you would want them flagged so the links can direct to OJAD instead of Weblio perhaps?
Includes all the jukugo + する words, and others that Weblio doesn’t have (アメリカ人, 旅行者) for example.
Fortunately OJAD was simple enough to parse and match by both word & reading.
IE: can correctly match 枝
with えだ
The remaining ones are too wonky to bother trying to automate with accuracy:
all the 〜suffix/prefix〜 WK entries
phrases お誕生日おめでとう, 結構です
partials 別の, お陰で
combos 四十二階, 缶コーヒー
I think they’d have to be manually figured out by examining component parts + listening to audio.
And of course some of them can’t be (suffix relies on what precedes it / word specific).
So… yeah I didn’t bother
A random selection if anyone wants to spot-check, though I’m 98% confident they’re all good.
Final pitch is particle*
{ character: 'お酒', reading: 'おさけ', pitchPattern: [ 0, 1, 1, 1 ] },
{ character: '明治', reading: 'めいじ', pitchPattern: [ 1, 0, 0, 0 ] },
{ character: '富士山',
reading: 'ふじさん',
pitchPattern: [ 1, 0, 0, 0, 0 ] },
{ character: '冷蔵庫',
reading: 'れいぞうこ',
pitchPattern: [ 0, 1, 1, 0, 0, 0 ] },
{ character: '豆', reading: 'まめ', pitchPattern: [ 0, 1, 0 ] },
{ character: '皮肉', reading: 'ひにく', pitchPattern: [ 0, 1, 1, 1 ] },
{ character: '奈良', reading: 'なら', pitchPattern: [ 1, 0, 0 ] },
{ character: '依存', reading: 'いぞん', pitchPattern: [ 0, 1, 1, 1 ] },
{ character: '苦労', reading: 'くろう', pitchPattern: [ 1, 0, 0, 0 ] },
{ character: '香港',
reading: 'ほんこん',
pitchPattern: [ 1, 0, 0, 0, 0 ] },
{ character: '台湾',
reading: 'たいわん',
pitchPattern: [ 0, 1, 1, 0, 0 ] },
I also have them all as just pitchnum form: [2] which the script wants to use instead.