There is [Userscript]: Anime Context Sentences, but searching for the vocab in sentences is not humanly made, so not checked for proper results; like sometimes it’s not checked if it is outside the speech or in names.
Also, I am not sure if is possible to extract audio. Furthermore, licensing to use elsewhere, for example, Anki or some other Japanese learning websites / apps.
Not to mention that the video is sometimes not properly chopped or out-of-sync.
About the amount of context sentences, there are a little less than 3 times of the number of vocabularies here, I guess.
Thinking about the resource making. It might be more proper to find a native audio sentence database (with or without video), and try to caption by myself. It is doable just like gradually adding vocabularies to Anki.
About the process, it can be done with Speech Recognition (or a real transcript) in advance, then proofreading during individual vocabulary learning.
Anime context sentences is made with https://immersionkit.com which is an external resource with a database of voiced sentences you can check for yourself.
Tofugu made a few cool things previously like releasing their single vocab audio under Creative Commons GitHub - tofugu/japanese-vocabulary-pronunciation-audio. However we’re yet to hear from them about context sentences.
Actually, bunpro has a lot of voiced sentences that are also split by words and are somewhat synchronized with wanikani vocab. But it’s uncertain how soon will it be released or ported someway to wanikani if ever.
I honor the ambitions but I do not see how it will solve the problems you pointed out in the opening post. It will require the same immense amount of proofreading as the other databases.
Actually, as of now, my Anki card is “EN → JP + autoplay sentence audio” and “JP → EN + autoplay vocab audio”, so yeah, I do need sentence with audio, if possible, for every vocabularies.
Theres plenty of tools out there that take subtitle tracks and break up the audio of an episode for that subtitle track. If you want to make a database of native sentences with audio, that right there would give you context, audio, text, and even visuals if you wanted.
The problem really lies in having a big enough database, that also accounts from vocabularies I gathered from non-listening methods; that is, aren’t originally from movie sentence mining.
Furthermore, even if the vocabulary is from listening, it doesn’t account for additional meanings / collocations. More sentences need to be found, anyway.
Nonetheless, I agree that I need to do sentence mining (with audio) at some point.
I talked with the author on Discord, and got the website fixed to what I wanted. So, I don’t really need to use the API.
Actually, I found your post - 🍃 Shenmue Tree - a Study "Lounge!" 🍂 - #307 by Vanilla but I still have to evaluate if I can make an actual searchable database. Apparently, audio is also a must at this point. Japanese sentence searching mechanism I am seeing now is probably PGroonga.
About ImmersionKit, I have grown dislike for cutesy Anime voices and goes for dramas instead. (Something like Death Note is still OK, though.) I also would consider going for news, documentaries and live actions. (Japanese songs, non-Anime ones, are ok as well, but I wonder if I should pursue them.) A part of it is, I don’t like Japanese pop culture that much any more.