Request - pitch

veryslowlearner · March 22, 2017, 3:18pm

Hi, it would be great to have an extension that shows pitch in a visual way.

rfindley · March 22, 2017, 5:38pm

You mean something like this?

seanblue · March 22, 2017, 11:24pm

Anything with pitch would be great.

veryslowlearner · March 23, 2017, 5:41am

Exactly like that. A bit bigger though.

Leebo · March 23, 2017, 5:45am

This would be great, because I already am trying to check every word I learn on the site you used, rfindley. This would save some time for sure.

seanblue · March 23, 2017, 11:04pm

What site is that?

Leebo · March 23, 2017, 11:04pm

http://www.gavo.t.u-tokyo.ac.jp/ojad/phrasing/index

jjatria · March 23, 2017, 11:27pm

I did my dissertation on this, so I’d be happy to help.

I have actually been thinking about this a bit. OJAD is great, but they have no API whatsoever, and zero interest in basically the kind of use that this sort of application would mean (source: personal meeting with the lead researcher). And scraping that website looks like a pain the ass.

What I thought would be an easy solution would be to generate data files from the OJAD data (or somewhere else), and basically query those files.

Here’s what I came up with. Sample:

[{
  "ふじ山": [["", "ふ", "じさん"]],
  "一":  [["い", "ち", ""]],
  "三人": [["さ", "んに", "ん"]],
  "上": [["う", "え"]]
}]

The top level element is an array with the vocabulary items for each level. Within each level, you have a map of items and the accented mora (in this sample, it’s a list of lists because there are sometimes differences between the accent pattern of different forms for verbs and adjectives).

In each “mora list”, the first item is the initial low mora (which may or may not exist). The second is the high mora (which will always* exist). The last is the final low mora (which will only exist in accented words).

So, from the example above:

["", "ふ", "じさん"]: HLLL [initial accented]
["い", "ち", ""]: LH(L) [final accented]
["さ", "んに", "ん"]: LHHL [accented in the penult]
["う", "え"]: LH(H) [unaccented, no fall]

How this is plugged into the WaniKani site… I can use some (ie. considerable) help.

* This is all based of course on standard Tokyo Japanese, in which a) words will never have more than one fall, and b) the first two mora will always be different (if one is H, the other will be L).

rfindley · March 23, 2017, 11:50pm

I extracted the relevant Javascript from OJAD. Generating the curve for a word or phrase requires an array of 1s an 0s for H and L… e.g. さんにん: [0,1,1,0].

The above array can be easily generated from the accent number, if you can find a good source for that. The chart for accent number is:

chrstahl89 · March 23, 2017, 11:59pm

Like generating this array?

set_accent_curve_phrase(‘#phrase_0_0’, 5, [0, 1, 1, 1, 1], 1, 0, 0);

Obviously the [0,1,1,1,1]

For こんにちは

rfindley · March 24, 2017, 12:02am

Yeah, that’s the array. The code takes that as input and generates the curve on the specified <canvas> tag.

chrstahl89 · March 24, 2017, 12:04am

Could make a script to simulate the post to “http://www.gavo.t.u-tokyo.ac.jp/ojad/phrasing/index”, then grab the array off the resulting HTML. Would make up for their lack of an API at least.

Then do it for an entire dictionary before they block IP lol.

rfindley · March 24, 2017, 12:07am

What I was getting at is that there are sources for pitch accent number (I’m trying to remember where), which can then be used to generate the array, which is all that’s needed for the JS to draw the curve. (The JS code is just an implementation of the Fujisaki method, by the way).

jjatria · March 24, 2017, 12:14am

Personally, I don’t like the Fujisaki-style curves. They look flashy, but they are not particularly clear to read, I think, and htey have loads of meaningless information. What matters is where the accent is placed, and what is the relative height of each mora.

In the (academic) literature, accents are not normally marked that way. I’ve often seen them marked in ways more like these:

As you can see, the one on the right is simpler, and has the only bit of information that is essential: the location of the fall, or its absence (I’ve seen this styled in web pages with spans and borders).

As for the source of that data, I think scraping OJAD, or getting the data manually, would be the easiest. I don’t know of any other reliable intonation dictionaries online. I have a paper one, but it doesn’t have compounds, and the API is horrid…

Leebo · March 24, 2017, 12:29am

My only question when it comes to any of these, is how is the distinction between heiban and odaka made visually?

jjatria · March 24, 2017, 12:33am

What is marked is the position of the pitch fall, and the accented mora is the one before the fall.

In unaccented (heiban) words, there is no fall, so there is no mark.

All other words will have a mark. If the fall is inside the word, the mark will be inside the word. If the fall is at the end of the word (odaka), then the mark will be at the end of the word.

polv · May 4, 2020, 8:46am

I’ve just re-subscribed to WaniKani, listen to Lv3, and realize the value of Pitch Accent… (I used to do WaniKani with sound off.)

Topic		Replies	Views
[Userscript] WaniKani Pitch Info API And Third-Party Apps	351	46854	November 18, 2025
Why are the pitch accents in Mac dictionary, OJAD, and KaniWani different from actual audio examples Japanese Language	14	1240	June 13, 2023
Looking for pronounciation resources Speaking	6	880	February 14, 2023
Pitch/accent on word/kanji readings Feedback	3	744	October 31, 2020
Pitch Accent App Resources	20	4172	April 17, 2024

Request - pitch

Related topics