Convert Written sentences to Audio: Mass automation possible? Otherwise, database of Audio sentence (natively spoken) by JLPT?

I believe it is possible with some kind of PHP call, but I don’t know how… also, I have forgotten most of about php for now, nor do I have a server.

I plan to create Audio sentences for sentences in Jtest4you deck, which is will throw into this project.

Better yet, collections of natively spoken Japanese sentences, preferably by JLPT levels.


I’ve got a script for this. It’s written in bash shell script, plus a tiny C program to URL-encode the sentence.
It accepts parameters for voice and speed selection. The text-to-speech service is:

Would that be useful?

Convert 1 sentence to 1 audio, but I’ll convert 1,000 sentences at a time.

It’s just a matter of wrapping a ‘for’ loop around the script, and feeding your sentences to it.

The question is whether the system can deal with 1000 sentences thrown at it, or will it have performance issues.

For a one-time fetch, just add a delay between each request if necessary. Even 1 sec between each is only 20 minutes total for 1000 sentences. I only took about 0.5s to do a fetch.

Even though I have just post this thread, the script would still be appreciated. I actually want audio sentences by JLPT, after all.

By the way, I use MacAir 11" with OSX, and I constantly run Python. If I had to run C, compiler?

I’m not familiar with Mac programming at all, but all this really comes down to is building a URL and fetching the resulting mp3.

From a quick google search, it looks like Mac has the ‘curl’ command, so you should be okay.
Here’s the basics of the shell script. You can pass in the ${text} variable.

urltext=`text2url ${text}`
curl "${url}" -H 'Cookie: affData=a%3A1%3A%7Bs%3A5%3A%22affId%22%3Bi%3A1070854%3B%7D' -H 'DNT: 1' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Accept: */*' -H 'Referer:' -H 'X-Requested-With: ShockwaveFlash/' -H 'Connection: keep-alive' --compressed -o "audio/${file}.mp3"

And the C program, text2url, is:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <libgen.h>

int main(int argc, char* argv[])
	if (argc != 2)
		fprintf(stderr, "Usage: %s <text>\n", basename(argv[0]));
		return EXIT_FAILURE;

	for (char *p = argv[1]; *p != '\0'; p++)
		printf("%%%2X", (unsigned char)*p);


That’s compatible with the gcc compiler.
But really, it’s just a printf command that converts the entire string to url-encoded form.

1 Like

Anki has a plugin that will look at whatever the front of the card is, and generate an audio file, and auto attach to the card. Forget what its called, but it exists.
And with Ankidroid at least, there’s an option to have it read the front of the card to you if there’s no audio file.
Granted, both of these option read the cards in robot voice.

1 Like

So, the easiest way to do this is actually creating a template with a white text (therefore, invisible), and enable Text-to-speech.

And set TTS as Japanese, plus auto-play audio… :grin:

For this one, it is AwesomeTTS.

<tts service="say" speed="175" voice="Kyoko" style="display: none">{{kanji:Japanese}}</tts>


TTS on the fly is ok, don’t really need to create audio file. On the fly mode would only work on the desktop, though.

Adding audio sentence files to multiple records is also possible, with a beautiful GUI. However, I don’t think it could add sound files to {{kanji:Japanese}}. (Can only to {{Japanese}}, in which Furigana will also be read.)

Sorry, but I couldn’t give you more than one Like. とってもとっても大好き。

Too bad, there is only one speaker woman in the this app… Her name is Kyoko.

I am too bad on the basics to make text2url.c works

gcc text2url.c does not work, only gives a.out
Also tried gcc text2url.c -static -v, it says

Patarapols-MacBook-Air:~ patarapolw$ gcc text2url.c -static -v
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
 "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple x86_64-apple-macosx10.12.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name text2url.c -static-define -mrelocation-model static -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu penryn -target-linker-version 278.4 -v -dwarf-column-info -debugger-tuning=lldb -resource-dir /Library/Developer/CommandLineTools/usr/bin/../lib/clang/8.1.0 -fdebug-compilation-dir /Users/patarapolw -ferror-limit 19 -fmessage-length 80 -stack-protector 1 -fblocks -fobjc-runtime=macosx-10.12.0 -fencode-extended-block-signature -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/xp/6rmdtcl52m97czk288mw4s_00000gn/T/text2url-eff58c.o -x c text2url.c
clang -cc1 version 8.1.0 (clang-802.0.42) default target x86_64-apple-darwin16.3.0
#include "..." search starts here:
#include <...> search starts here:
 /System/Library/Frameworks (framework directory)
 /Library/Frameworks (framework directory)
End of search list.
 "/Library/Developer/CommandLineTools/usr/bin/ld" -demangle -lto_library /Library/Developer/CommandLineTools/usr/lib/libLTO.dylib -no_deduplicate -static -arch x86_64 -macosx_version_min 10.12.0 -o a.out -lcrt0.o /var/folders/xp/6rmdtcl52m97czk288mw4s_00000gn/T/text2url-eff58c.o
ld: library not found for -lcrt0.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Patarapols-MacBook-Air:~ patarapolw$ 

But I need text2url, which is an executable command, don’t I?

gcc -otext2url -Wall -Werror text2url.c

But I’m looking at:
ld: library not found for -lcrt0.o

That’s the most basic library required for any executable program to build. So I’m guessing gcc isn’t fully configured.
You might also try g++ instead of gcc

Since you’re on a mac, your computer can do this natively, no external libraries needed.
If you want to write a bash script, you can create TTS audio files with a simple command:

say -v kyoko "こんにちは"

There are 2 Japanese voices for mac, kyoko (female) and otoya (male), but you’ll probably have to download otoya and the high quality version of kyoko if you want to use that one.

You can pass in the -o flag to specify an output file and it will create a file of the appropriate type as the extension you name the file with

3gp2  3GPP-2 Audio         (.3g2) [Qclp,aac,aace,aacf,aach,aacl,aacp]
3gpp  3GP Audio            (.3gp) [Qclp,aac,aace,aacf,aach,aacl,aacp]
AIFC  AIFC                 (.aifc,.aiff,.aif) [lpcm,ulaw,alaw,ima4,Qclp]
AIFF  AIFF                 (.aiff,.aif) [lpcm]
NeXT  NeXT/Sun             (.snd,.au) [lpcm,ulaw]
Sd2f  Sound Designer II    (.sd2) [lpcm]
WAVE  WAVE                 (.wav) [lpcm,ulaw,alaw]
adts  AAC ADTS             (.aac,.adts) [aac,aach,aacp]
caff  CAF                  (.caf) [Qclp,aac,aace,aacf,aach,aacl,aacp,alac,alaw,ilbc,ima4,lpcm,ulaw]
m4af  Apple MPEG-4 Audio   (.m4a,.m4r) [aac,aace,aacf,aach,aacl,aacp,alac]
m4bf  Apple MPEG-4 AudioBooks (.m4b) [aac,aace,aacf,aach,aacl,aacp]
mp4f  MPEG-4 Audio         (.mp4) [aac,aace,aacf,aach,aacl,aacp]

You can find more info in the docs here:

You could write a bash script to loop through all your sentences and create files for you.

But since you also mentioned Python you can wrap this whole thing in a Python script to make it easier for you if you’re not as comfortable with bash (I’m not the best with bash), just use the os.system command to run the “say” command

from os import system
system(u'say "鰐蟹は最高です" -v kyoko')

you could fairly easily wrap that in a little python script like this:

# coding: utf-8

import codecs
from os import system

# let's say we have a sentences file sentences.txt that looks like this:
# 伏線の張り方がすごい
# 鰐蟹は最高です
# 利害が一致すれば、呉越同舟も厭わない
# ニッカは柴犬ではなく珍島犬です

with'sentences.txt', 'r', 'utf-8') as s:
    for idx, sentence in enumerate(s):
        system(u'say -v kyoko -o <some_folder_path>/japanese_sentence_{:03d}.m4a {1}'.format(idx, sentence).encode('utf-8'))

This takes each line of the file, which is a sentence, and creates an mp4 file in <some_folder_path> and creates a file in that folder called japanese_sentence_000.m4a and then labels them with sequential numbers.

Note: this is Python 2, Python 3 doesn’t have the annoying unicode issues, all strings are unicode in P3. You don’t need to use the codecs library or the u'some string' prefix.

And finally, here is a folder with the audio files I created with this little script so you can see what they sound like for yourself, I used m4a encoding b/c file transfer services will let you preview in the browser so you don’t have to download them:


I am trying to do something like, but with my database of sentences. I can read that with openpyxl

The code is roughly

import os
from pydub import AudioSegment

os.system('say -v alex -o goo1.wav --data-format=LEF32@22050 Hello World')
os.system('say -v kyoko -o goo2.wav --data-format=LEF32@22050 世界へようこそ〜')

voice = AudioSegment.from_wav('goo1.wav') + AudioSegment.silent(duration=1000) + AudioSegment.from_wav('goo2.wav')
voice.export('goog.wav', format='wav')
os.system('afplay goog.wav')

For non-Mac, you might try pyttsx3 or gTTS here – Speech Recognition in Python (Text to speech) - Python

It isn’t that hard to make poor-man’s Glossika.

Of course, an easier way in Anki is AwesomeTTS.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.