What is the best (or decent) TTS (text-to-speech) engine?

polv · March 3, 2021, 8:36pm

I am targeting not only Japanese, but also Chinese. I might try to use Forvo API, but that is not always possible for random words and sentences.

I know there is Windows’ built-in and macOS’s built-in; where you can access easily with Anki’s AwesomeTTS. But, is there a decent version for Linux / Docker? espeak sounds bad for Chinese…

I plan to make a web service, where I shouldn’t rely on Google TTS hacks. It should be free and non-dependent forever.

I see that there is TensorflowTTS, but what about the training data?

BTW, for those who just want to make Anki TTS, and want to save time,

WaniKani vocabulary audio scraped from the API (i.e. not TTS)
WaniKani sentence audio made from macOS’s Kyoko voice (i.e. is TTS)

Kumirei · March 4, 2021, 12:29pm

Well, that’s the only one I’ve used

polv · March 4, 2021, 2:14pm

I made a web service (with gTTS), and it can be added to Anki without downloading anything via.

    <audio controls>
        <source src="https://u74842.deta.dev/tts.mp3?q=日本へ" type="audio/mp3">
    </audio>

I fear that Google may kill gTTS without notice at anytime (for paid, of course).

And, for some reasons, <audio src=""> doesn’t work in Anki.

Jirachi · March 4, 2021, 2:19pm

Prosody Tutor Suzuki-Kun gives really accurate results, with 4 voices and graphs of the pitch accent.

polv · March 4, 2021, 2:21pm

It’s good, but it seems have to be manually paste and playback and save file.

Does it expose an API?

Jirachi · March 4, 2021, 3:25pm

Unfortunately, the fine print says Suzuki-kun can’t be used on external websites, it’s just for individual practice ig

felix330 · March 4, 2021, 4:16pm

I’ve only used IBM Watson and Microsoft Azure TTS so far. They both have some great neural voices that sound quite realistic, with Microsoft having the much larger selection of available languages and voices. You can try it out here, the Japanese sounds pretty realistic to me. It’s only free up to a monthly limit though.

polv · February 17, 2022, 8:55pm

Google TTS is confused about what readings to use here (for 角)

I also recalled that gTTS is bad with 行う.

(Well, it seems that Microsoft Azure got it right.)

BTW, what about Windows or macOS’s built-in TTS engines?

It makes me wonder, if some TTS, at least, have Furigana support?

chikorita157 · February 17, 2022, 11:18pm

The macOS TTS for Japanese is pretty decent, but it doesn’t work with the Siri voices as Apple restricts the use of it. You can easily output the audio with the “say” command in the terminal with the -o flag with the filename. Obviously, the Microsoft Azure voice still better, which I have added to my SRS study app I am currently developing.

polv · February 18, 2022, 5:21pm

Actually, it does work, but no autoplay. With a variation, it does autoplay in AnkiDroid, but not in Anki Desktop.

[sound:https://u74842.deta.dev/tts?q={{Japanese}}]

I renamed the URL a little. It will try to find in WaniKani API / Forvo, or synthesize from Azure, or gTTS. In addition, I stored the generated files in Object storage, so it won’t cost too much Forvo or TTS resources.

Also, I parsed {{ExampleSentences}} more complex than that.

class="has-tts"

<p class="has-tts">{{LookupSentence}}</p>

<script type="module">
  function speak(q = '', lang = 'ja-JP') {
    if (window.AnkiDroidJS) {
      AnkiDroidJS.ankiTtsSetLanguage?.(lang)
      AnkiDroidJS.ankiTtsSpeak?.(q)
    }
  }

  class TTSElement extends HTMLElement {
    constructor() {
      super()

      this.attachShadow({ mode: 'open' })

      this.wrapper = document.createElement('span')
      this.wrapper.innerText = '▶️'
      this.wrapper.style.cursor = 'pointer'
      this.shadowRoot.append(this.wrapper)
    }

    async connectedCallback() {
      const q = this.getAttribute('word')
      try {
        if (q) {
          const elAudio = document.createElement('audio')

          const setSrc = (...sources) => {
            if (!sources.length) {
              sources = [`https://u74842.deta.dev/tts?q=${encodeURIComponent(q)}`]
            }

            sources.map((s) => {
              const elSource = document.createElement('source')
              elSource.src = s
              elAudio.append(elSource)
            })
          }

          elAudio.style.display = 'none'
          this.wrapper.append(elAudio)

          this.onclick = async () => {
            if (!elAudio.querySelector('source')) {
              setSrc()
            }
            elAudio.play()
          }
        }
      } catch (e) {
        console.error(e)
        this.onclick = () => {
          speak(q)
        }
      }
    }
  }

  try {
    customElements.define('el-tts', TTSElement)
  } catch (e) { }
</script>

<script type="module">
  const removeHTML = (h) => {
    const div = document.createElement('div')
    div.innerHTML = h
    return div.innerText
  }

  const reJa = /[\p{sc=Han}\p{sc=Katakana}\p{sc=Hiragana}]/u
  document.querySelectorAll('.has-tts').forEach((el) => {
    const newEls = el.innerHTML.split(/<br *\/?>/g).map((t) => {
      t = removeHTML(t)

      const div = document.createElement('div')
      div.append(t)

      if (reJa.test(t)) {
        const elTTS = document.createElement('el-tts')
        elTTS.setAttribute('word', t)
        div.append(elTTS)
      }

      return div
    })

    el.textContent = ''
    el.append(...newEls)
  })
</script>

Topic		Replies	Views
Siri TTS vs Amazon Polly TTS API And Third-Party Apps	8	775	December 5, 2022
Sentences with native audio (esp. Context) Resources	8	1031	February 25, 2023
Speech to Text Extension for Inputting Wanikani Answers? API And Third-Party Apps	3	244	June 2, 2024
Audio for example sentances Feedback	10	367	March 8, 2021
Best Vocab SRS platform and deck? Resources	9	1118	January 26, 2023

What is the best (or decent) TTS (text-to-speech) engine?

Related topics