Phonetic minimization of the text corpus in Belarusian for the speech synthesis system training
Abstract
The most modern speech synthesis systems are based on the corpus-based method. The corpus-based method, unlike previously popular compilation method, uses natural speech database that does not consist of separate specially selected elements of compilation, but represents the corpus of phonograms of natural speech. Large amounts of text and corresponding audio information, which represents a significant challenge for so-called under-resourced languages, which include Belarusian, are required to achieve high-quality synthesized speech in this approach. In this case, a common approach is to use phonetic minimization, special selection of texts, when the amount of text corpus is maximally reduced, but at the same time phonetic fullness is preserved. The article discusses the information about the nature and the functioning the corpus-based method of sound signal generation in speech synthesis systems, provides a detailed overview of the approaches to the formation of text and speech corpuses, required for speech generation by the corpus-based method. The second half of the work is devoted to the description of the elaborated algorithm of the text corpus phonetic minimization in Belarusian language, as well as technical and linguistic resources used to implement it. A description of the developed software prototype as well as a description of the series of experiments on phonetic minimization are given to demonstrate the efficiency of the algorithm.
About the Author
S. I. LysyBelarus
Junior Researcher
References
1. Safarik R., Nouza J., ed. Camelin N., Estève Y., Martín-Vide C. Unified approach to development of ASR systems for east slavic languages. Proceedings of 5th International Conference "Statistical Language and Speech Processing" (SLSP’2017), Le Mans, France, 23–25 October 2017. Springer, 2017, pp. 193–203.
2. Hetsevich Yu. S. Aŭtamatyzavanaja apracoŭka simvaĺnych vyrazaŭ u tekstach dlia sistemy sintezu bielaruskaha maŭliennia [Automated processing symbol expressions in the texts for belarusian speech-to-text synthesis]. Informatika [Informatics], 2011, no. 4(32), pp. 82–93 (in Belarusian).
3. Lysy S. I., Hetsevich Yu. S. Hienieracyja nacyjanaĺnaj transkrypcyi tekstaŭ na bielaruskaj movie [Generating the national transcription of texts in Belarusian]. Informatika [Informatics], 2017, no. 2(54), pp. 84–92 (in Belarusian).
4. Hunt A., Black A. Unit selection in a concatenative speech synthesis system using a large speech database. Proceedings of IEEE International Conference "Acoustic, Speech and Signal Processing" (ICASSP’96), Atlanta, USA, 7–10 May 1996. Atlanta, 1996, vol. 1, pp. 373–376.
5. Lobanov B. M., Cirul'nik L. I. Komp'yuternyj sintez i klonirovanie rechi. Computer Synthesis and Speech Cloning. Minsk, Belаruskaya navuka, 2008, 344 p. (in Russian).
6. Coorman G., Fackrell J., Rutten P., Van Coile B. Segment selection in the L&H Realspeak laboratory TTS system. Proceedings of 6th International Conference "Spoken Language Processing" (ICSLP’2000), Beijing, China, 16–20 October 2000. Beijing, 2000, vol. 2, pp. 395–398.
7. Godfrey J., Zampolli A. Language Resources. Survey of the State of the Art in Human Language Technology. Cambrige University Press, 1996, ch. 12, pp. 357–384.
8. Zinovieva N. Phonetically sufficient allophonic database for concatenation synthesis of russian speech. Proceedings of the 13th Section "International Congress of Phonetic Sciences" (ICPhS’95), Stockholm, Sweden, 13–19 August 1995. Stockholm, 1995, vol. 2, pp. 358–361.
9. Fotinea S.-E., Tambouratzis G., Carayannis G. Constructing a segment database for greek time domain
10. speech synthesis. Proceedings of 7th European Conference "Speech Communication and Technology" (EUROSPEECH’2001), Aalborg, Denmark, 3–7 September 2001. Aalborg, 2001, vol. 3, pp. 2075–2078.
11. Lambert T., Breen A. A database design for a TTS synthesis system using lexical diphones. Proceedings of 9th European Conference "Speech Communication and Technology" (InterSpeech’2004), Jeju Island, Korea, 4–8 October 2004. Jeju Island, 2004, pp. 1381–1384.
12. Lyudovyk T., Sazhok M. Speech databases used for concatenative speech synthesis. Proceedings of 7th All-Ukrainian International Conference on Signal/Image Processing and Pattern Recognition (UkrObraz’2004). Kyjiv, 2004, pp. 111–114.
13. Zakrevskij A. D., Pottosin Yu. V., Cheremisinova L. D. Osnovy logicheskogo proektirovaniya [Basics of logical design]. Kniga 1. Kombinatornye algoritmy diskretnoj matematiki [Book 1. Combinatorial algorithms of discrete mathematics]. Minsk, the United Institute of Informatics Problems National Academy of Sciences of Belarus, 2004, 226 p. (in Russian).
14. Cormen T. H., Leiserson Ch. E., Rivest R. L., Stein C. Introduction to Algorithms. 3d Ed. Cambridge, The MIT Press, 2009, 1292 p.
15. Hue X. Genetic Algorithms for Optimization. Edinburgh, Edinburgh Parallel Computing Centre Press, 1997, 276 p.
16. Matoušek J., Psutka J. ARTIC: A new Czech text-to-speech system using statistical approach to speech segment database construction. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP’2000), Beijing, China, 16–20 October 2000. Beijing, 2000, vol. 4, pp. 612–615.
17. Barbot N., Boëffard O., Delhay A. Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora. Proceedings of the International Conference on Language Resources and Evaluation (LREC’12). Istanbul, 2012, pp. 969–974.
18. Narendra N. P., Rao K. S.,•Ghosh K.,•Vempada R. R.,•Maity S. Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology, 2011, no 14(3), pp. 167–181.
19. Kayte S., Mundada M., Kayte C. A review of unit selection speech synthesis. International Journal of Advanced Research in Computer Science and Software Engineering, 2015, no 5(10), pp. 475–479.
20. Bonafonte A., Adell J., Esquerra I., Gallego S., Moreno A., Pérez J. Corpus and voices for catalan speech synthesis. Proceedings of the International Conference on Language Resources and Evaluation (LREC’2008), Marrakech, Morocco, 26 May–1 June 2008. Marrakech, 2008, pp. 3325–3329.
21. Casademont E. G., Bonafonte A., Moreno M. Building synthetic voices in the META-NET framework. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 21–27 May 2012. Istanbul, 2012, pp. 3322–3326.
Review
For citations:
Lysy S.I. Phonetic minimization of the text corpus in Belarusian for the speech synthesis system training. Informatics. 2019;16(1):75-85. (In Russ.)