Request - pitch


#1

Hi, it would be great to have an extension that shows pitch in a visual way.


#2

You mean something like this?


#3

Anything with pitch would be great.


#4

Exactly like that. A bit bigger though.


#5

This would be great, because I already am trying to check every word I learn on the site you used, rfindley. This would save some time for sure.


#6

What site is that?


#7

http://www.gavo.t.u-tokyo.ac.jp/ojad/phrasing/index


#8

I did my dissertation on this, so I’d be happy to help.

I have actually been thinking about this a bit. OJAD is great, but they have no API whatsoever, and zero interest in basically the kind of use that this sort of application would mean (source: personal meeting with the lead researcher). And scraping that website looks like a pain the ass.

What I thought would be an easy solution would be to generate data files from the OJAD data (or somewhere else), and basically query those files.

Here’s what I came up with. Sample:

[{
  "ふじ山": [["", "ふ", "じさん"]],
  "一":  [["い", "ち", ""]],
  "三人": [["さ", "んに", "ん"]],
  "上": [["う", "え"]]
}]

The top level element is an array with the vocabulary items for each level. Within each level, you have a map of items and the accented mora (in this sample, it’s a list of lists because there are sometimes differences between the accent pattern of different forms for verbs and adjectives).

In each “mora list”, the first item is the initial low mora (which may or may not exist). The second is the high mora (which will always* exist). The last is the final low mora (which will only exist in accented words).

So, from the example above:

  • ["", "ふ", "じさん"]: HLLL [initial accented]
  • ["い", "ち", ""]: LH(L) [final accented]
  • ["さ", "んに", "ん"]: LHHL [accented in the penult]
  • ["う", "え"]: LH(H) [unaccented, no fall]

How this is plugged into the WaniKani site… I can use some (ie. considerable) help. :slight_smile:


* This is all based of course on standard Tokyo Japanese, in which a) words will never have more than one fall, and b) the first two mora will always be different (if one is H, the other will be L).


#9

I extracted the relevant Javascript from OJAD. Generating the curve for a word or phrase requires an array of 1s an 0s for H and L… e.g. さんにん: [0,1,1,0].

The above array can be easily generated from the accent number, if you can find a good source for that. The chart for accent number is:


#10

Like generating this array?

set_accent_curve_phrase(’#phrase_0_0’, 5, [0, 1, 1, 1, 1], 1, 0, 0);

Obviously the [0,1,1,1,1]

For こんにちは


#11

Yeah, that’s the array. The code takes that as input and generates the curve on the specified <canvas> tag.


#12

Could make a script to simulate the post to “http://www.gavo.t.u-tokyo.ac.jp/ojad/phrasing/index”, then grab the array off the resulting HTML. Would make up for their lack of an API at least.

Then do it for an entire dictionary before they block IP lol.


#13

What I was getting at is that there are sources for pitch accent number (I’m trying to remember where), which can then be used to generate the array, which is all that’s needed for the JS to draw the curve. (The JS code is just an implementation of the Fujisaki method, by the way).


#14

Personally, I don’t like the Fujisaki-style curves. They look flashy, but they are not particularly clear to read, I think, and htey have loads of meaningless information. What matters is where the accent is placed, and what is the relative height of each mora.

In the (academic) literature, accents are not normally marked that way. I’ve often seen them marked in ways more like these:

As you can see, the one on the right is simpler, and has the only bit of information that is essential: the location of the fall, or its absence (I’ve seen this styled in web pages with spans and borders).

As for the source of that data, I think scraping OJAD, or getting the data manually, would be the easiest. I don’t know of any other reliable intonation dictionaries online. I have a paper one, but it doesn’t have compounds, and the API is horrid… :stuck_out_tongue:


[Userscript] WaniKani Pitch Info
#15

My only question when it comes to any of these, is how is the distinction between heiban and odaka made visually?


#16

What is marked is the position of the pitch fall, and the accented mora is the one before the fall.

In unaccented (heiban) words, there is no fall, so there is no mark.

All other words will have a mark. If the fall is inside the word, the mark will be inside the word. If the fall is at the end of the word (odaka), then the mark will be at the end of the word.