Classical Latin text-to-speech (tts)

To master a language, it is not sufficient to learn to read it well. You must also listen to it spoken. With a dead language like Latin, that means listening to recordings. There are very few audiobooks in Latin, and still less recorded in a decent accent. So I had the idea of making my own using TTS.

(Google Translate has a Latin voice, but it's actually just the Italian voice. It's useless for the restored pronunciation. It's not even useful for the ecclesiastical pronunciation of Latin.)

The only option for Latin TTS is Espeak. There are two Latin voices available: the standard "la" voice and the "mbrola-la1" voice (you must install the Mbrola voice separately). The standard voice is a drunken robot, very difficult to listen to. The Mbrola voice is much better, but it has a serious incompatibility with Espeak, for which I opened a ticket on Espeak's Github page. I tried using regular expressions to rework the source text (and I discovered that someone else had tried to do the same). Unfortunately, all my efforts have been vain and serious problems persist. It seems that until the developers of Espeak fix this issue, we're stuck with the drunken robot voice. It's very primitive, and barely intelligible, so the information that follows will only really be useful when we get a working mbrola voice.

Before firing up Espeak, you need to prepare your text:
  • You might want to replace abbreviations:
    • Roman names: Q. (Quintus), P. (Publius), Sex. (Sextus), etc. You have to do this manually in each instance, because you need to put the name in the right case, for example "M. Tullio Ciceroni" will be "Marco Tullio Ciceroni." 
    • Abbreviations like A.U.C. (ab urbe condita), S.p.d. (salutem plurimam dicit), a.d. (ante diem), Kalend. (Kalendas).
  • Espeak's support of numerals is inconsistent, so it's better to replace Roman and Arabic numerals like LXX and 70 by the written form, e.g., septuagintā.
  • You'll want to get rid of formatting signs, like section numbers, which can be very distracting to listen to.
  • I haven't found a satisfactory way of treating Greek words. Maybe SSML could help. 
  • The most important step (and only essential one) is to add macrons to your text. Espeak depends on macrons to pronounce words correctly. This is especially important for the stress accent. The best automated tool is Alatius's Macronizer (GitHub page here). When you use this tool, there's no need to review the output, it will be good enough for text-to-speech even if it's wrong in a very small number of vowels (but you might want to do a quick review of the final a of first declension nouns and adjectives). 
  • When I prepare my texts, I only go through the steps of replacing formatting signs and macronizing the text. I can live with abbreviations and bad numbers.
Once your source text is ready, do this to generate your ebook:
$ espeak -v la -s 100 -g 1 -f input2.txt --stdout | ffmpeg -i - output.ogg

You can play around with the speed (-s option), and the space between words (-g option) or choose another audio format, like output.mp3.

When (if ever) Espeak supports the mbrola-la1 voice, you just need to replace -v la by -v mb-la1.

For regular English text, these options make espeak more intelligible:
espeak -ven+f3 -k5 -s150


Comments

Popular posts from this blog

Recording from more than once microphone in Linux

High-quality audio conferencing