What is the best (or decent) TTS (text-to-speech) engine?

I am targeting not only Japanese, but also Chinese. I might try to use Forvo API, but that is not always possible for random words and sentences.

I know there is Windows’ built-in and macOS’s built-in; where you can access easily with Anki’s AwesomeTTS. But, is there a decent version for Linux / Docker? espeak sounds bad for Chinese…

I plan to make a web service, where I shouldn’t rely on Google TTS hacks. It should be free and non-dependent forever.

I see that there is TensorflowTTS, but what about the training data?

BTW, for those who just want to make Anki TTS, and want to save time,

Well, that’s the only one I’ve used

I made a web service (with gTTS), and it can be added to Anki without downloading anything via.

    <audio controls>
        <source src="https://u74842.deta.dev/tts.mp3?q=日本へ" type="audio/mp3">
    </audio>

I fear that Google may kill gTTS without notice at anytime (for paid, of course).

And, for some reasons, <audio src=""> doesn’t work in Anki.

Prosody Tutor Suzuki-Kun gives really accurate results, with 4 voices and graphs of the pitch accent.

2 Likes

It’s good, but it seems have to be manually paste and playback and save file.

Does it expose an API?

Unfortunately, the fine print says Suzuki-kun can’t be used on external websites, it’s just for individual practice ig

I’ve only used IBM Watson and Microsoft Azure TTS so far. They both have some great neural voices that sound quite realistic, with Microsoft having the much larger selection of available languages and voices. You can try it out here, the Japanese sounds pretty realistic to me. It’s only free up to a monthly limit though.

1 Like

Google TTS is confused about what readings to use here (for 角)

I also recalled that gTTS is bad with 行う.

(Well, it seems that Microsoft Azure got it right.)

BTW, what about Windows or macOS’s built-in TTS engines?

It makes me wonder, if some TTS, at least, have Furigana support?

The macOS TTS for Japanese is pretty decent, but it doesn’t work with the Siri voices as Apple restricts the use of it. You can easily output the audio with the “say” command in the terminal with the -o flag with the filename. Obviously, the Microsoft Azure voice still better, which I have added to my SRS study app I am currently developing.

1 Like

Actually, it does work, but no autoplay. With a variation, it does autoplay in AnkiDroid, but not in Anki Desktop.

[sound:https://u74842.deta.dev/tts?q={{Japanese}}]

I renamed the URL a little. It will try to find in WaniKani API / Forvo, or synthesize from Azure, or gTTS. In addition, I stored the generated files in Object storage, so it won’t cost too much Forvo or TTS resources.

Also, I parsed {{ExampleSentences}} more complex than that.

class="has-tts"
<p class="has-tts">{{LookupSentence}}</p>

<script type="module">
  function speak(q = '', lang = 'ja-JP') {
    if (window.AnkiDroidJS) {
      AnkiDroidJS.ankiTtsSetLanguage?.(lang)
      AnkiDroidJS.ankiTtsSpeak?.(q)
    }
  }

  class TTSElement extends HTMLElement {
    constructor() {
      super()

      this.attachShadow({ mode: 'open' })

      this.wrapper = document.createElement('span')
      this.wrapper.innerText = '▶️'
      this.wrapper.style.cursor = 'pointer'
      this.shadowRoot.append(this.wrapper)
    }

    async connectedCallback() {
      const q = this.getAttribute('word')
      try {
        if (q) {
          const elAudio = document.createElement('audio')

          const setSrc = (...sources) => {
            if (!sources.length) {
              sources = [`https://u74842.deta.dev/tts?q=${encodeURIComponent(q)}`]
            }

            sources.map((s) => {
              const elSource = document.createElement('source')
              elSource.src = s
              elAudio.append(elSource)
            })
          }

          elAudio.style.display = 'none'
          this.wrapper.append(elAudio)

          this.onclick = async () => {
            if (!elAudio.querySelector('source')) {
              setSrc()
            }
            elAudio.play()
          }
        }
      } catch (e) {
        console.error(e)
        this.onclick = () => {
          speak(q)
        }
      }
    }
  }

  try {
    customElements.define('el-tts', TTSElement)
  } catch (e) { }
</script>

<script type="module">
  const removeHTML = (h) => {
    const div = document.createElement('div')
    div.innerHTML = h
    return div.innerText
  }

  const reJa = /[\p{sc=Han}\p{sc=Katakana}\p{sc=Hiragana}]/u
  document.querySelectorAll('.has-tts').forEach((el) => {
    const newEls = el.innerHTML.split(/<br *\/?>/g).map((t) => {
      t = removeHTML(t)

      const div = document.createElement('div')
      div.append(t)

      if (reJa.test(t)) {
        const elTTS = document.createElement('el-tts')
        elTTS.setAttribute('word', t)
        div.append(elTTS)
      }

      return div
    })

    el.textContent = ''
    el.append(...newEls)
  })
</script>