Since you’re on a mac, your computer can do this natively, no external libraries needed.
If you want to write a bash script, you can create TTS audio files with a simple command:
say -v kyoko "こんにちは"
There are 2 Japanese voices for mac, kyoko (female) and otoya (male), but you’ll probably have to download otoya and the high quality version of kyoko if you want to use that one.
You can pass in the -o flag to specify an output file and it will create a file of the appropriate type as the extension you name the file with
options:
3gp2 3GPP-2 Audio (.3g2) [Qclp,aac,aace,aacf,aach,aacl,aacp]
3gpp 3GP Audio (.3gp) [Qclp,aac,aace,aacf,aach,aacl,aacp]
AIFC AIFC (.aifc,.aiff,.aif) [lpcm,ulaw,alaw,ima4,Qclp]
AIFF AIFF (.aiff,.aif) [lpcm]
NeXT NeXT/Sun (.snd,.au) [lpcm,ulaw]
Sd2f Sound Designer II (.sd2) [lpcm]
WAVE WAVE (.wav) [lpcm,ulaw,alaw]
adts AAC ADTS (.aac,.adts) [aac,aach,aacp]
caff CAF (.caf) [Qclp,aac,aace,aacf,aach,aacl,aacp,alac,alaw,ilbc,ima4,lpcm,ulaw]
m4af Apple MPEG-4 Audio (.m4a,.m4r) [aac,aace,aacf,aach,aacl,aacp,alac]
m4bf Apple MPEG-4 AudioBooks (.m4b) [aac,aace,aacf,aach,aacl,aacp]
mp4f MPEG-4 Audio (.mp4) [aac,aace,aacf,aach,aacl,aacp]
You can find more info in the docs here:
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/say.1.html
You could write a bash script to loop through all your sentences and create files for you.
But since you also mentioned Python you can wrap this whole thing in a Python script to make it easier for you if you’re not as comfortable with bash (I’m not the best with bash), just use the os.system command to run the “say” command
from os import system
system(u'say "鰐蟹は最高です" -v kyoko')
you could fairly easily wrap that in a little python script like this:
# coding: utf-8
import codecs
from os import system
# let's say we have a sentences file sentences.txt that looks like this:
# 伏線の張り方がすごい
# 鰐蟹は最高です
# 利害が一致すれば、呉越同舟も厭わない
# ニッカは柴犬ではなく珍島犬です
with codecs.open('sentences.txt', 'r', 'utf-8') as s:
for idx, sentence in enumerate(s):
system(u'say -v kyoko -o <some_folder_path>/japanese_sentence_{:03d}.m4a {1}'.format(idx, sentence).encode('utf-8'))
This takes each line of the file, which is a sentence, and creates an mp4 file in <some_folder_path> and creates a file in that folder called japanese_sentence_000.m4a
and then labels them with sequential numbers.
Note: this is Python 2, Python 3 doesn’t have the annoying unicode issues, all strings are unicode in P3. You don’t need to use the codecs library or the u'some string'
prefix.
And finally, here is a folder with the audio files I created with this little script so you can see what they sound like for yourself, I used m4a encoding b/c file transfer services will let you preview in the browser so you don’t have to download them:
https://app.box.com/s/0qh57i166mx0omeui0qrn77ww7yogqgm