Audio link unique identifier

SleepySpider · July 4, 2015, 10:06pm

I need to somehow grab the url for a particular vocab word to play the audio.
If I look at the page source it’s something like this: https://s3.amazonaws.com/s3.wanikani.com/audio/cf1d465792f380a6a2841c0732545454be2b3041.mp3" type=“audio/mpeg”>

Which is fine, except I don’t think the API returns that unique identifier when you request the vocab data.
So, the question is: How do I get this unique identifier? Or, if I have a vocab word, how to I get the correct url to play the mp3?

I tried to see if maybe I could use Jisho’s audio, but they also use unique identifiers, of which I don’t know how to map.

As of right now, I sorta have japanesepod101.com’s audio assets working, but I have to pass in both a kanji and a kana reading to get the audio. Which I don’t understand why.

Like this:
http://assets.languagepod101.com/dictionary/japanese/audiomp3.php?kanji=大した&kana=たいした

You can also use an id, but those are all I know. I don’t know if there are any other ways, and I don’t know how to figure out what the interface is to audiomp3.php to figure out if there are better ways to query the audio.

Anybody have any ideas or suggestions?

gizmo · July 4, 2015, 10:31pm

You could scrape the WaniKani vocab pages for the URL. The vocab pages themselves are deterministic at https://www.wanikani.com/vocabulary/<kanji>. From looking at the source, you could then simply parse for any “source” tags and take their “src” attribute, which should give you one link for a .ogg, and one link for a .mp3

As this is not offered through the API, you might want to check with the devolpers that this is something they are okay with you doing. Or at least rate-limit yourself.

zosiu · July 5, 2015, 5:36am

You could also use google TTS:
format: http://translate.google.com/translate_tts?ie=UTF-8&q=YOUR_WORD&tl=ja example: http://translate.google.com/translate_tts?ie=UTF-8&q=調子はどうですか&tl=ja

anon91083167 · July 5, 2015, 6:39am

Last time we talked about this, the thread got blocked ^^
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1

xMunch · July 5, 2015, 7:11am

MarioRash said... Last time we talked about this, the thread got blocked ^^
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1

They didn't say no ;^).

I could throw a crawler together in the morning after I wake up if you'd like.

If you want to do it yourself, I would suggest getting links to the individual vocab page through here:https://www.wanikani.com/lattice/vocabulary/status then as said above parse audio url. Easy.

SleepySpider · July 5, 2015, 3:16pm

I don’t actually want the audio, just the links to the audio.

But… I don’t want to do a crawler because every time new vocab words are added, I’d have to recrawl.
I sorta want it generalizable. In other words, I want to be able to use audio from wanikani, and if wanikani doesn’t have it, then I go elsewhere. Since it’s sorta a wanikani chrome extension.

Or, if I can’t use wanikani at all, just use a generic source.
The google translate seems like an excellent option to start with. Thanks! If it turns out that I don’t like it, maybe I’ll figure something else out.

It looks like, from that other mentioned thread, that wanikani left the audio stuff off the api intentionally. And I don’t want to rape wanikani’s frontend (or backend) to get what I want, it would make me feel bad. ;-p

xMunch · July 5, 2015, 3:35pm

Oh… I guess I’ll stop my crawling. I was working at a pretty conservative rate, ~40 requests per minute.

Edit: Just curious, what are you trying to do with this.

SleepySpider · July 5, 2015, 4:14pm

Thanks for the crawling effort!

Well, I was going to suprise everyone, but I guess I’ll tell you guys here since you guys helped me out.
I’m adding quite a few features to the wanikanify extension. Here’s the current changelist I’m working on.
I’m done with most of them except some error checking and chrome sync.

If you’re not sure what wanikanify is, it basically replaces english words with japanese vocab words from wanikani on the fly as you browse the internets. Clicking the word changes it back and forth from english/japanese. This occurs depending upon what level you’re currently at.

Changelist:

-Clicking on a translated vocab word will cause audio to play using Google TTS. (Which now works btw, but it’s not perfect as it can play the wrong pronunciation).

-User can now import large amounts of vocab words using google spreadsheets to supplement wanikani vocab. (Also works, but still needs some more testing)

-User can now override wanikani vocab entries AND “Google spreadsheets import” using the “custom vocab box”. (Also works) For example, “time” gets translated as “〜回”, which is silly. So you can use this to override it to just “回” if you want.

-Settings for wanikanify now are persistence across computers if user has Chrome’s sync functionality enabled. (In progress)

I’m not the original author of wanikanify, but I forked his repo and when I’m done he’ll merge my changes back in, (he said my changes sounded cool) then we’ll release the changes to the google chrome extension store thingy.

An interesting side effect of the spreadsheet feature, wanikanify can also be used with other languages as well, not just japanese. I’ll have to disable the requirement of adding an api key though, but there’s no reason it can’t translate english words to german, for example.

xMunch · July 5, 2015, 4:44pm

Just saying, there is a way you could only crawl new vocab. It would be similar to how I store UIDs on my userscript. You find which uids you don’t have an work from there. I could throw something together for you if you’d like.

SleepySpider · July 5, 2015, 6:50pm

Yea sure, I think that would be very helpful. thanks!
I might end up using a combination of these or something. I’m not quite sure yet.
I’m not really a web developer, so getting help with this would be good, if it’s not too much work.

Thanks!

xMunch · July 5, 2015, 7:11pm

aragonsr said... Yea sure, I think that would be very helpful. thanks!
I might end up using a combination of these or something. I'm not quite sure yet.
I'm not really a web developer, so getting help with this would be good, if it's not too much work.

Thanks!

https://gist.github.com/xMunch/3dd4221cead8c9572faf -- JS audio file(It's only an object with the audio links; vocab:link format)
https://gist.github.com/xMunch/26a79c44a66d0083f00d -- Crawler to update links.

SleepySpider · July 6, 2015, 2:10am

Great thanks!

Topic		Replies	Views
API v2 get audio files for review API And Third-Party Apps	5	839	November 17, 2019
Export of WaniKani radicals, kanji, and vocab? API And Third-Party Apps	9	2066	January 27, 2015
Recognizing vocab by listening to it API And Third-Party Apps	3	189	March 27, 2025
Humble Request: Wanikani to Anki with Audio Files API And Third-Party Apps	13	2120	June 11, 2018
Downloading Audio Requesting Help	5	1009	October 12, 2021

Audio link unique identifier

Related topics