I need to somehow grab the url for a particular vocab word to play the audio.
If I look at the page source it’s something like this: https://s3.amazonaws.com/s3.wanikani.com/audio/cf1d465792f380a6a2841c0732545454be2b3041.mp3" type=“audio/mpeg”>
Which is fine, except I don’t think the API returns that unique identifier when you request the vocab data.
So, the question is: How do I get this unique identifier? Or, if I have a vocab word, how to I get the correct url to play the mp3?
I tried to see if maybe I could use Jisho’s audio, but they also use unique identifiers, of which I don’t know how to map.
As of right now, I sorta have japanesepod101.com’s audio assets working, but I have to pass in both a kanji and a kana reading to get the audio. Which I don’t understand why.
Like this:
http://assets.languagepod101.com/dictionary/japanese/audiomp3.php?kanji=大した&kana=たいした
You can also use an id, but those are all I know. I don’t know if there are any other ways, and I don’t know how to figure out what the interface is to audiomp3.php to figure out if there are better ways to query the audio.
Anybody have any ideas or suggestions?
You could scrape the WaniKani vocab pages for the URL. The vocab pages themselves are deterministic at https://www.wanikani.com/vocabulary/<kanji>. From looking at the source, you could then simply parse for any “source” tags and take their “src” attribute, which should give you one link for a .ogg, and one link for a .mp3
As this is not offered through the API, you might want to check with the devolpers that this is something they are okay with you doing. Or at least rate-limit yourself.
You could also use google TTS: format: http://translate.google.com/translate_tts?ie=UTF-8&q=YOUR_WORD&tl=ja
example: http://translate.google.com/translate_tts?ie=UTF-8&q=調子はどうですか&tl=ja
Last time we talked about this, the thread got blocked ^^
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1
MarioRash said... Last time we talked about this, the thread got blocked ^^They didn't say no ;^).
/t/Humble-Request-Wanikani-to-Anki-with-Audio-Files/7314/1
I could throw a crawler together in the morning after I wake up if you'd like.
If you want to do it yourself, I would suggest getting links to the individual vocab page through here:https://www.wanikani.com/lattice/vocabulary/status then as said above parse audio url. Easy.
I don’t actually want the audio, just the links to the audio.
But… I don’t want to do a crawler because every time new vocab words are added, I’d have to recrawl.
I sorta want it generalizable. In other words, I want to be able to use audio from wanikani, and if wanikani doesn’t have it, then I go elsewhere. Since it’s sorta a wanikani chrome extension.
Or, if I can’t use wanikani at all, just use a generic source.
The google translate seems like an excellent option to start with. Thanks! If it turns out that I don’t like it, maybe I’ll figure something else out.
It looks like, from that other mentioned thread, that wanikani left the audio stuff off the api intentionally. And I don’t want to rape wanikani’s frontend (or backend) to get what I want, it would make me feel bad. ;-p
Oh… I guess I’ll stop my crawling. I was working at a pretty conservative rate, ~40 requests per minute.
Edit: Just curious, what are you trying to do with this.
Thanks for the crawling effort!
Well, I was going to suprise everyone, but I guess I’ll tell you guys here since you guys helped me out.
I’m adding quite a few features to the wanikanify extension. Here’s the current changelist I’m working on.
I’m done with most of them except some error checking and chrome sync.
If you’re not sure what wanikanify is, it basically replaces english words with japanese vocab words from wanikani on the fly as you browse the internets. Clicking the word changes it back and forth from english/japanese. This occurs depending upon what level you’re currently at.
I’m not the original author of wanikanify, but I forked his repo and when I’m done he’ll merge my changes back in, (he said my changes sounded cool) then we’ll release the changes to the google chrome extension store thingy.
An interesting side effect of the spreadsheet feature, wanikanify can also be used with other languages as well, not just japanese. I’ll have to disable the requirement of adding an api key though, but there’s no reason it can’t translate english words to german, for example.
Just saying, there is a way you could only crawl new vocab. It would be similar to how I store UIDs on my userscript. You find which uids you don’t have an work from there. I could throw something together for you if you’d like.
Yea sure, I think that would be very helpful. thanks!
I might end up using a combination of these or something. I’m not quite sure yet.
I’m not really a web developer, so getting help with this would be good, if it’s not too much work.
Thanks!
aragonsr said... Yea sure, I think that would be very helpful. thanks!https://gist.github.com/xMunch/3dd4221cead8c9572faf -- JS audio file(It's only an object with the audio links; vocab:link format)
I might end up using a combination of these or something. I'm not quite sure yet.
I'm not really a web developer, so getting help with this would be good, if it's not too much work.
Thanks!
https://gist.github.com/xMunch/26a79c44a66d0083f00d -- Crawler to update links.
Great thanks!