Compare engines
Precomposed characters
Check if the following two words give different output in a speech synthesis engine:
- "finish" (normal text)
- "finish" (with fi precomposed ligature)
They should be the same if the engine knows about Unicode normalisation. I did a quick check over the TTS engines/voices that I have access to (☒: different, ☑: same):
- ☒ Festival voice_nitech_us_awb_arctic_hts (on GNU/Linux)
- ☒ Microsoft Zira Mobile (on Win 10)
- ☒ Microsoft Mark Mobile (on Win 10)
- ☑ Ivona Amy UK English (on Android)
- ☒ Bing Translator's TTS (on the Web)
- ☑ Google Translate's TTS (on the Web)
- ☑ VoiceOver Alex (on Mac OS X).
Here is a list of pre-composed Latin characters.
API
- SAPI 5.4
- Speech Synthesis Manager
- MBROLA
Need a way to identify abbreviations and the context it is in. For example, Ala can be either Alabama or Alanine depending on what is being talked about.
See english for more information about abbreviations.