No Japanese Subtitles? Whisper might be the solution

During my lurking, I discovered whisper, which is a speech recognition model that can transcribe audio to text using deep learning stuff. People of course want to learn Japanese from Anime, but that is only really possible with Japanese subtitles, which can be a hit and miss in availablity as the meow website releases don’t provide the Japanese closed captions with the raw video releases. The good thing with Anime is that the voice actresses/actors speak thing clearly, so it can increase the accuracy of the resulting output through AI.

I actually played around with Whisper, but it performs best on Windows. Tried to get it working on a Mac Studio, but it runs slow and the GPU acceleration doesn’t quite work yet. Thankfully, I have a VM with a nVidia Geforce 980 TI pass through. I choose an episode to test this with. I decided to try an episode of Prima Doll, Episode 8. Also, there is a Japanese blog that has most of the dialog for most anime, so I can use that to correct it. I needed mkvextract to get the audio, but once I did that, I’m off to the races.

I’m still checking the transcript and haven’t finished it since it’s so late, so the jury is out, but so far, the mistakes are relatively minor, although it messed up a few lines by inserting extranous output. Also, I don’t expect it to handle names of the characters well as it doesn’t know what Kanji to use.

Not looking too bad, except for some corrections, it gets most dialog correct. I think this can be helpful in getting 90% of the way there in creating Japanese subtitles, manual corrections are still needed.

Update I finished the corrections, and the final number of changes are 74. Also, discovered the timings can be a bit off as well and doesn’t handle more than one person talking at the same time.

You can try it for yourself if you have a powerful GPU.

11 Likes

Very slow for me to run for me. Also, ffmpeg -i [input] -vn -acodec copy out.aac itself can extract the audio, although I can’t just pipe.

I think it runs for about 1/2 hour now.

It defaults to cpu, you need to use —device cuda (if you have an nVidia GPU) for GPU acceleration, but you need to have the CUDA toolkit installed too.

This is what I used:
whisper audio.m4a —language Japanese —model medium —device cuda

1 Like

CUDA version must be chosen appropriately for PyTorch version as well.

So, stable PyTorch wants 11.6.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.