No Japanese Subtitles? Whisper might be the solution

chikorita157 · September 24, 2022, 3:22am

During my lurking, I discovered whisper, which is a speech recognition model that can transcribe audio to text using deep learning stuff. People of course want to learn Japanese from Anime, but that is only really possible with Japanese subtitles, which can be a hit and miss in availablity as the meow website releases don’t provide the Japanese closed captions with the raw video releases. The good thing with Anime is that the voice actresses/actors speak thing clearly, so it can increase the accuracy of the resulting output through AI.

I actually played around with Whisper, but it performs best on Windows. Tried to get it working on a Mac Studio, but it runs slow and the GPU acceleration doesn’t quite work yet. Thankfully, I have a VM with a nVidia Geforce 980 TI pass through. I choose an episode to test this with. I decided to try an episode of Prima Doll, Episode 8. Also, there is a Japanese blog that has most of the dialog for most anime, so I can use that to correct it. I needed mkvextract to get the audio, but once I did that, I’m off to the races.

I’m still checking the transcript and haven’t finished it since it’s so late, so the jury is out, but so far, the mistakes are relatively minor, although it messed up a few lines by inserting extranous output. Also, I don’t expect it to handle names of the characters well as it doesn’t know what Kanji to use.

Not looking too bad, except for some corrections, it gets most dialog correct. I think this can be helpful in getting 90% of the way there in creating Japanese subtitles, manual corrections are still needed.

Update I finished the corrections, and the final number of changes are 74. Also, discovered the timings can be a bit off as well and doesn’t handle more than one person talking at the same time.

You can try it for yourself if you have a powerful GPU.

polv · September 25, 2022, 2:53pm

Very slow for me to run for me. Also, ffmpeg -i [input] -vn -acodec copy out.aac itself can extract the audio, although I can’t just pipe.

I think it runs for about 1/2 hour now.

chikorita157 · September 25, 2022, 3:33pm

It defaults to cpu, you need to use —device cuda (if you have an nVidia GPU) for GPU acceleration, but you need to have the CUDA toolkit installed too.

This is what I used:
whisper audio.m4a —language Japanese —model medium —device cuda

polv · September 25, 2022, 5:28pm

CUDA version must be chosen appropriately for PyTorch version as well.

So, stable PyTorch wants 11.6.

system · September 25, 2023, 5:29pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Practice production with any anime/drama/other video API And Third-Party Apps	6	778	August 4, 2016
Transcribing Japanese Audio Listening	7	1938	August 2, 2019
Japanese Text-to-Speech on text selection for Mac; I have to manually set it up Grammar	7	3257	October 26, 2018
Great app for Listening and Reading Comprehension API And Third-Party Apps	0	615	March 25, 2014
Speaking japanese readings to wanikani API And Third-Party Apps	5	557	March 24, 2021

No Japanese Subtitles? Whisper might be the solution

Related topics