Add sound URLs to API?


#1

It would be awesome if the API returned the S3 URL of the sound files.  It looks the S3 keys are named using a SHA hash, so alternatively if we knew what was hashed maybe we could just compute it.  Maybe I’m missing something?

I’ve made a sound->word Anki deck, and would love to work a web app like Wanikani-Audio-Trainer (which is brilliant, btw), but I’ve had to resort to page scraping the sounds.  


#2

 I suspect that the sound files are intentionally difficult to download so that you can’t easily distribute them to non-subscribers.

Of course, you could scrape all 6,000 or so pages and get the URLs, so there isn’t any real security in the current system. But, hey, nobody’s bothered to do it so far, so I guess it’s good enough for now. :slight_smile:


#3

Sort of doubt that security is a priority, as they are unauthenticated URLs that anyone can download:
https://s3.amazonaws.com/s3.wanikani.com/audio/0aed4b2a9345378cbbd1b592e38a23fe8281af94.mp3

S3 makes it very easy to generate signed URLs, which would prevent this, if they cared.


#4

Of course, anyone can download them. But manually downloading 6,000 files is such a pain in the ass that nobody’s bothered to do it. Good enough, I’d say.


#5

There’s a Chrome app that quizzes on the pronunciation, it scraps the HTML page to get the audio file. I don’t think the devs have anything against those kind of applications and putting the audio link file in the API would save them some resources (and therefore money) in this instance.


#6

The audio can be accessed by anyone using the beta jisho website.


#7
blastomere said... It would be awesome if the API returned the S3 URL of the sound files.  It looks the S3 keys are named using a SHA hash, so alternatively if we knew what was hashed maybe we could just compute it.  Maybe I'm missing something?
 For the record I've already tried hashing a bunch of stuff (with md5 and sha1) in order to try to get the URL with no luck. If you're trying to do this I would jump straight to scraping it from the HTML.