Audio link unique identifier


#1

I need to somehow grab the url for a particular vocab word to play the audio.
If I look at the page source it’s something like this: https://s3.amazonaws.com/s3.wanikani.com/audio/cf1d465792f380a6a2841c0732545454be2b3041.mp3" type=“audio/mpeg”>

Which is fine, except I don’t think the API returns that unique identifier when you request the vocab data.
So, the question is: How do I get this unique identifier? Or, if I have a vocab word, how to I get the correct url to play the mp3?

I tried to see if maybe I could use Jisho’s audio, but they also use unique identifiers, of which I don’t know how to map.

As of right now, I sorta have japanesepod101.com’s audio assets working, but I have to pass in both a kanji and a kana reading to get the audio. Which I don’t understand why.

Like this:
http://assets.languagepod101.com/dictionary/japanese/audiomp3.php?kanji=大した&kana=たいし

You can also use an id, but those are all I know. I don’t know if there are any other ways, and I don’t know how to figure out what the interface is to audiomp3.php to figure out if there are better ways to query the audio.

Anybody have any ideas or suggestions?


#2

You could scrape the WaniKani vocab pages for the URL. The vocab pages themselves are deterministic at https://www.wanikani.com/vocabulary/<kanji>. From looking at the source, you could then simply parse for any “source” tags and take their “src” attribute, which should give you one link for a .ogg, and one link for a .mp3

As this is not offered through the API, you might want to check with the devolpers that this is something they are okay with you doing. Or at least rate-limit yourself.


#3

You could also use google TTS: 
format: http://translate.google.com/translate_tts?ie=UTF-8&q=YOUR_WORD&tl=ja
example: http://translate.google.com/translate_tts?ie=UTF-8&q=調子はどうですか&tl=ja


#4

Last time we talked about this, the thread got blocked ^^
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1


#5
MarioRash said... Last time we talked about this, the thread got blocked ^^
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1
 They didn't say no ;^).



I could throw a crawler together in the morning after I wake up if you'd like.

If you want to do it yourself, I would suggest getting links to the individual vocab page through here:https://www.wanikani.com/lattice/vocabulary/status then as said above parse audio url.  Easy.

#6

I don’t actually want the audio, just the links to the audio.

But… I don’t want to do a crawler because every time new vocab words are added, I’d have to recrawl.
I sorta want it generalizable. In other words, I want to be able to use audio from wanikani, and if wanikani doesn’t have it, then I go elsewhere. Since it’s sorta a wanikani chrome extension.

Or, if I can’t use wanikani at all, just use a generic source.
The google translate seems like an excellent option to start with. Thanks! If it turns out that I don’t like it, maybe I’ll figure something else out.

It looks like, from that other mentioned thread, that wanikani left the audio stuff off the api intentionally. And I don’t want to rape wanikani’s frontend (or backend) to get what I want, it would make me feel bad. ;-p


#7

Oh…  I guess I’ll stop my crawling.  I was working at a pretty conservative rate, ~40 requests per minute.

Edit:  Just curious, what are you trying to do with this.


#8

Thanks for the crawling effort!

Well, I was going to suprise everyone, but I guess I’ll tell you guys here since you guys helped me out.
I’m adding quite a few features to the wanikanify extension. Here’s the current changelist I’m working on.
 I’m done with most of them except some error checking and chrome sync.

If you’re not sure what wanikanify is, it basically replaces english words with japanese vocab words from wanikani on the fly as you browse the internets. Clicking the word changes it back and forth from english/japanese. This occurs depending upon what level you’re currently at.

Changelist:
-Clicking on a translated vocab word will cause audio to play using Google TTS. (Which now works btw, but it’s not perfect as it can play the wrong pronunciation).
-User can now import large amounts of vocab words using google spreadsheets to supplement wanikani vocab. (Also works, but still needs some more testing)
-User can now override wanikani vocab entries AND “Google spreadsheets import” using the “custom vocab box”. (Also works) For example, “time” gets translated as “〜回”, which is silly. So you can use this to override it to just “回” if you want.
-Settings for wanikanify now are persistence across computers if user has Chrome’s sync functionality enabled. (In progress)

I’m not the original author of wanikanify, but I forked his repo and when I’m done he’ll merge my changes back in, (he said my changes sounded cool) then we’ll release the changes to the google chrome extension store thingy.

An interesting side effect of the spreadsheet feature, wanikanify can also be used with other languages as well, not just japanese. I’ll have to disable the requirement of adding an api key though, but there’s no reason it can’t translate english words to german, for example.

#9

Just saying, there is a way you could only crawl new vocab.  It would be similar to how I store UIDs on my userscript.  You find which uids you don’t have an work from there.  I could throw something together for you if you’d like.


#10

Yea sure, I think that would be very helpful. thanks!
I might end up using a combination of these or something. I’m not quite sure yet.
I’m not really a web developer, so getting help with this would be good, if it’s not too much work.

Thanks!


#11
aragonsr said... Yea sure, I think that would be very helpful. thanks!
I might end up using a combination of these or something. I'm not quite sure yet.
I'm not really a web developer, so getting help with this would be good, if it's not too much work.

Thanks!
https://gist.github.com/xMunch/3dd4221cead8c9572faf -- JS audio file(It's only an object with the audio links; vocab:link format)
https://gist.github.com/xMunch/26a79c44a66d0083f00d -- Crawler to update links.


#12

Great thanks!