Voicevox TTS a good resource for pitch accent?

I’ve recently come across Voicevox which is a free Japanese voice synthesis desktop app. https://voicevox.hiroshiba.jp/

Though the very moe-anime aesthetic might be off-putting to some, what makes it especially useful is how it displays pitch accent very clearly and even allows editing of pronunciation.

I’ve started adding audio to my vocab anki cards using this tool. I appreciate that I can double-check that the pitch accent generated is correct. It’s nice that it can synthesize entire sentences, so that I can add audio for context sentences on the flashcards too.

The pitch accent seems generally correct, but the problem is it sometimes produces strange results so I usually double check with other resources and correct it when necessary.

To sum up…:

  • Does this seem like the best solution for adding audio to Anki Flash cards?
  • What do you use for adding audio to your own flash cards?
  • Any opinions on the quality of this tool?

I’m not familiar with this tool, so I can’t comment on it, but I use audio from Forvo. Forvo is nice because it’s actual people speaking the words.

I don’t know what the workflow would be for manually adding it to cards, but I use Migaku’s tool to add the recordings to my cards for me. (Currently, it’s done via an Anki add-on, but that will be transitioned to their Chrome extension in the next release upon its public release.)


This seems like just the thing I would need for my sentence cards, but windows is throwing up all manner of warnings when I try to install it. Can anyone more computer savvy than me confirm this is safe / unsafe?

Either way, just based on the samples given on the website it seems pretty awesome, don’t know if it’s THE best, but it seems to be the best one I’ve encountered at least. I currently use a combination of forvo, AwesomeTTS (IBM Watson ja-JP_EmiVoice) and when possible sentence mining from sources with actual audio (audiobooks, anime, youtube videos, …). I would only really use it to replace the AwesomeTTS portion, but that would still mean quite a lot of sentences :smile_cat:

Seeing as I haven’t used it myself though can’t comment on the quality besides the example sentences, so do take my advice with a grain of salt :woman_shrugging:

This is noted that this happens in the in the q&a under “「Windows によって PC が保護されました」と表示されます。” https://voicevox.hiroshiba.jp/qa

Also, the software is open source at https://github.com/VOICEVOX/voicevox if that provides more confidence.


I am worried about sentence resources with audio, so Forvo isn’t cut for it. gTTS is bad, and it can’t really batch create sounds.

Also, I am looking for ways to create sounds on the fly on a server, rather than creating them in advance and messing with Anki sync size.


Actually, two things are wrong here. Another is やり直す and 角.

Right, it often requires some tuning. In some cases there’s real ambiguity that a computer can’t resolve—like maybe you should turn left at the third つの :rofl: