<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">inform</journal-id><journal-title-group><journal-title xml:lang="ru">Информатика</journal-title><trans-title-group xml:lang="en"><trans-title>Informatics</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1816-0301</issn><issn pub-type="epub">2617-6963</issn><publisher><publisher-name>UIIP NASB</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">inform-748</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION</subject></subj-group></article-categories><title-group><article-title>Фанетычная мінімізацыя корпуса тэкстаў на беларускай мове для навучання сістэмы сінтэзу маўлення</article-title><trans-title-group xml:lang="en"><trans-title>Phonetic minimization of the text corpus in Belarusian for the speech synthesis system training</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Лысы</surname><given-names>С. I.</given-names></name><name name-style="western" xml:lang="en"><surname>Lysy</surname><given-names>S. I.</given-names></name></name-alternatives><bio xml:lang="ru"><p>малодшы навуковы супрацоўнік</p></bio><bio xml:lang="en"><p>Junior Researcher</p></bio><email xlink:type="simple">stanislau.lysy@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Аб’яднаны інстытут праблем інфарматыкі Нацыянальнай акадэміі навук Беларусі, Мінск</institution></aff><aff xml:lang="en"><institution>The United Institute of Informatics Problems of the National Academyof Sciences of Belarus,&#13;
Minsk</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2019</year></pub-date><pub-date pub-type="epub"><day>21</day><month>01</month><year>2019</year></pub-date><volume>16</volume><issue>1</issue><fpage>75</fpage><lpage>85</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Лысы С.I., 2019</copyright-statement><copyright-year>2019</copyright-year><copyright-holder xml:lang="ru">Лысы С.I.</copyright-holder><copyright-holder xml:lang="en">Lysy S.I.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://inf.grid.by/jour/article/view/748">https://inf.grid.by/jour/article/view/748</self-uri><abstract><p>Большасць сучасных сістэм сінтэзу маўлення базіруюць сваю працу на корпусным метадзе. Корпусны метад, у адрозненні ад папулярнага раней кампіляцыйнага, выкарыстоўвае базу дадзеных натуральнага маўлення, якая складаецца не з асобных спецыяльна выбраных элементаў кампіляцыі, а ўяўляе сабой корпус фанаграм натуральнага маўлення. Для дасягнення высокай якасці сінтэзаванага маўлення пры такім падыходзе патрабуюцца вялікія аб’ѐмы тэкставай і адпаведнай гукавой інфармацыі, што з’яўляецца істотнай праблемай для так званых нерэсурсных моў, да якіх адносіцца і беларуская. У такім выпадку, як правіла, прымяняецца фанетычная мінімізацыя – адмысловы адбор тэкстаў, у выніку якога аб’ѐм тэкставага корпуса максімальна змяншаецца, але пры гэтым захоўваецца фанетычная паўната. У артыкуле разглядаюцца звесткі пра сутнасць і спосаб працы корпуснага метаду генерацыі гукавога сігналу ў сістэмах сінтэзу маўлення, прыводзіцца падрабязны агляд падыходаў да фарміравання тэкставых і маўленчых карпусоў, неабходных для генерацыі маўлення корпусным метадам. Другая палова працы прысвечана апісанню распрацаванага алгарытму фанетычнай мінімізацыі корпуса тэкстаў на беларускай мове, а таксама тэхнічных і лінгвістычных рэсурсаў, выкарыстаных для яго рэалізацыі. Прыводзяцца апісанні распрацаванага праграмнага прататыпа і шэрагу праведзеных аўтарам эксперыментаў па фанетычнай мінімізацыі.</p></abstract><trans-abstract xml:lang="en"><p>The most modern speech synthesis systems are based on the corpus-based method. The corpus-based method, unlike previously popular compilation method, uses natural speech database that does not consist of separate specially selected elements of compilation, but represents the corpus of phonograms of natural speech. Large amounts of text and corresponding audio information, which represents a significant challenge for so-called under-resourced languages, which include Belarusian, are required to achieve high-quality synthesized speech in this approach. In this case, a common approach is to use phonetic minimization, special selection of texts, when the amount of text corpus is maximally reduced, but at the same time phonetic fullness is preserved. The article discusses the information about the nature and the functioning the corpus-based method of sound signal generation in speech synthesis systems, provides a detailed overview of the approaches to the formation of text and speech corpuses, required for speech generation by the corpus-based method. The second half of the work is devoted to the description of the elaborated algorithm of the text corpus phonetic minimization in Belarusian language, as well as technical and linguistic resources used to implement it. A description of the developed software prototype as well as a description of the series of experiments on phonetic minimization are given to demonstrate the efficiency of the algorithm.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>фанетычная мінімізацыя</kwd><kwd>беларуская мова</kwd><kwd>сінтэз маўлення</kwd><kwd>корпусны метад</kwd><kwd>корпус тэкстаў</kwd></kwd-group><kwd-group xml:lang="en"><kwd>phonetic minimization</kwd><kwd>the Belarusian language</kwd><kwd>speech synthesis</kwd><kwd>corpus-based method</kwd><kwd>text corpus</kwd></kwd-group><funding-group><funding-statement xml:lang="ru">Даследаванне выканана пры падтрымцы гранта НАН Беларусі № 2018-25-032 «Алгарытмы інтэрнэт-сінтэзатара беларускага маўлення і аўтаматызаванага стварэння лінгвістычных рэсурсаў», а таксама ў межах праекта БРФФД № Ф17МС-039 «Высакаякасны інтэрнэт-увод і інтэрнэт-вывад маўлення, захаванне і сістэматызацыя вялікіх аб’ѐмаў (Big Data) маўлення».</funding-statement></funding-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Safarik, R. Unified approach to development of ASR systems for east slavic languages / R. Safarik,</mixed-citation><mixed-citation xml:lang="en">Safarik R., Nouza J., ed. Camelin N., Estève Y., Martín-Vide C. Unified approach to development of ASR systems for east slavic languages. Proceedings of 5th International Conference "Statistical Language and Speech Processing" (SLSP’2017), Le Mans, France, 23–25 October 2017. Springer, 2017, pp. 193–203.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">J. Nouza; ed. N. Camelin, Y. Estève, C. Martín-Vide // Proc. of 5th Intern. Conf. «Statistical Language and Speech Processing» (SLSP’2017), Le Mans, France, 23–25 Oct. 2017. – Springer, 2017. – P. 193–203.</mixed-citation><mixed-citation xml:lang="en">Hetsevich Yu. S. Aŭtamatyzavanaja apracoŭka simvaĺnych vyrazaŭ u tekstach dlia sistemy sintezu bielaruskaha maŭliennia [Automated processing symbol expressions in the texts for belarusian speech-to-text synthesis]. Informatika [Informatics], 2011, no. 4(32), pp. 82–93 (in Belarusian).</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Гецэвіч, Ю. С. Аўтаматызаваная апрацоўка сімвальных выразаў у тэкстах для сістэмы сінтэзу беларускага маўлення / Ю. С. Гецэвіч // Информатика. – 2011. – № 4(32). – С. 82–93.</mixed-citation><mixed-citation xml:lang="en">Lysy S. I., Hetsevich Yu. S. Hienieracyja nacyjanaĺnaj transkrypcyi tekstaŭ na bielaruskaj movie [Generating the national transcription of texts in Belarusian]. Informatika [Informatics], 2017, no. 2(54), pp. 84–92 (in Belarusian).</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Лысы, С. І. Генерацыя нацыянальнай транскрыпцыі тэкстаў на беларускай мове / С. І. Лысы, Ю. С. Гецэвіч // Информатика. – 2017. – № 2(54). – C. 84–92.</mixed-citation><mixed-citation xml:lang="en">Hunt A., Black A. Unit selection in a concatenative speech synthesis system using a large speech database. Proceedings of IEEE International Conference "Acoustic, Speech and Signal Processing" (ICASSP’96), Atlanta, USA, 7–10 May 1996. Atlanta, 1996, vol. 1, pp. 373–376.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Hunt, A. Unit selection in a concatenative speech synthesis system using a large speech database / A. Hunt, A. Black // Proc. of IEEE Intern. Conf. «Acoustic, Speech and Signal Processing» (ICASSP’96), Atlanta, USA, 7–10 May 1996. – Atlanta, 1996. – Vol. 1. – P. 373–376.</mixed-citation><mixed-citation xml:lang="en">Lobanov B. M., Cirul'nik L. I. Komp'yuternyj sintez i klonirovanie rechi. Computer Synthesis and Speech Cloning. Minsk, Belаruskaya navuka, 2008, 344 p. (in Russian).</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Лобанов, Б. М. Компьютерный синтез и клонирование речи / Б. М. Лобанов, Л. И. Цирульник // Минск : Беларус. навука, 2008. – 344 с.</mixed-citation><mixed-citation xml:lang="en">Coorman G., Fackrell J., Rutten P., Van Coile B. Segment selection in the L&amp;H Realspeak laboratory TTS system. Proceedings of 6th International Conference "Spoken Language Processing" (ICSLP’2000), Beijing, China, 16–20 October 2000. Beijing, 2000, vol. 2, pp. 395–398.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Segment selection in the L&amp;H Realspeak laboratory TTS system / G. Coorman [et. al.] // Proc. of 6th Intern. Conf. «Spoken Language Processing» (ICSLP’2000), Beijing, China, 16–20 Oct. 2000. – Beijing, 2000. – Vol. 2. – P. 395–398.</mixed-citation><mixed-citation xml:lang="en">Godfrey J., Zampolli A. Language Resources. Survey of the State of the Art in Human Language Technology. Cambrige University Press, 1996, ch. 12, pp. 357–384.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Godfrey, J. Language Resources / J. Godfrey, A. Zampolli // Survey of the State of the Art in Human Language Technology. – Cambrige University Press, 1996. – Ch. 12. – P. 357–384.</mixed-citation><mixed-citation xml:lang="en">Zinovieva N. Phonetically sufficient allophonic database for concatenation synthesis of russian speech. Proceedings of the 13th Section "International Congress of Phonetic Sciences" (ICPhS’95), Stockholm, Sweden, 13–19 August 1995. Stockholm, 1995, vol. 2, pp. 358–361.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Zinovieva, N. Phonetically sufficient allophonic database for concatenation synthesis of russian speech / N. Zinovieva // Proc. of the 13th Section «Intern. Congress of Phonetic Sciences» (ICPhS’95), Stockholm, Sweden, 13–19 Aug. 1995. – Stockholm, 1995. – Vol. 2. – P. 358–361.</mixed-citation><mixed-citation xml:lang="en">Fotinea S.-E., Tambouratzis G., Carayannis G. Constructing a segment database for greek time domain</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Fotinea, S.-E. Constructing a segment database for greek time domain speech synthesis / S.-E. Fotinea, G. Tambouratzis, G. Carayannis // Proc. of 7th European Conf. «Speech Communication and Technology» (EUROSPEECH’2001), Aalborg, Denmark, 3–7 Sept. 2001. – Aalborg, 2001. – Vol. 3. – P. 2075–2078.</mixed-citation><mixed-citation xml:lang="en">speech synthesis. Proceedings of 7th European Conference "Speech Communication and Technology" (EUROSPEECH’2001), Aalborg, Denmark, 3–7 September 2001. Aalborg, 2001, vol. 3, pp. 2075–2078.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Lambert, T. A database design for a TTS synthesis system using lexical diphones / T. Lambert, A. Breen // Proc. of 9th European Conf. «Speech Communication and Technology» (InterSpeech’2004), Jeju Island, Korea, 4–8 Oct. 2004. – Jeju Island, 2004. – P. 1381–1384.</mixed-citation><mixed-citation xml:lang="en">Lambert T., Breen A. A database design for a TTS synthesis system using lexical diphones. Proceedings of 9th European Conference "Speech Communication and Technology" (InterSpeech’2004), Jeju Island, Korea, 4–8 October 2004. Jeju Island, 2004, pp. 1381–1384.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Lyudovyk, T. Speech databases used for concatenative speech synthesis / T. Lyudovyk, M. Sazhok // Proc. of 7th All-Ukrainian Intern. Conf. on Signal/Image Processing and Pattern Recognition (UkrObraz’2004). – Kyjiv, 2004. – P. 111–114.</mixed-citation><mixed-citation xml:lang="en">Lyudovyk T., Sazhok M. Speech databases used for concatenative speech synthesis. Proceedings of 7th All-Ukrainian International Conference on Signal/Image Processing and Pattern Recognition (UkrObraz’2004). Kyjiv, 2004, pp. 111–114.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Закревский, А. Д. Основы логического проектирования / А. Д. Закревский, Ю. В. Поттосин, Л. Д. Черемисинова. – Минск : ОИПИ НАН Беларуси, 2004. – Кн. 1. Комбинаторные алгоритмы дискретной математики. – 226 с.</mixed-citation><mixed-citation xml:lang="en">Zakrevskij A. D., Pottosin Yu. V., Cheremisinova L. D. Osnovy logicheskogo proektirovaniya [Basics of logical design]. Kniga 1. Kombinatornye algoritmy diskretnoj matematiki [Book 1. Combinatorial algorithms of discrete mathematics]. Minsk, the United Institute of Informatics Problems National Academy of Sciences of Belarus, 2004, 226 p. (in Russian).</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Introduction to Algorithms / T. H. Cormen [et. al.]. – 3rd ed. – Cambridge : The MIT Press, 2009. – 1292 p.</mixed-citation><mixed-citation xml:lang="en">Cormen T. H., Leiserson Ch. E., Rivest R. L., Stein C. Introduction to Algorithms. 3d Ed. Cambridge, The MIT Press, 2009, 1292 p.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Hue, X. Genetic algorithms for optimization / X. Hue. – Edinburgh : Edinburgh Parallel Computing Centre Press, 1997. – 276 p.</mixed-citation><mixed-citation xml:lang="en">Hue X. Genetic Algorithms for Optimization. Edinburgh, Edinburgh Parallel Computing Centre Press, 1997, 276 p.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Matoušek, J. ARTIC: A new Czech text-to-speech system using statistical approach to speech segment database construction / J. Matoušek, J. Psutka // Proc. of the 6th Intern. Conf. on Spoken Language Processing (ICSLP’2000), Beijing, China, 16–20 Oct. 2000. – Beijing, 2000. – Vol. 4. – P. 612–615.</mixed-citation><mixed-citation xml:lang="en">Matoušek J., Psutka J. ARTIC: A new Czech text-to-speech system using statistical approach to speech segment database construction. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP’2000), Beijing, China, 16–20 October 2000. Beijing, 2000, vol. 4, pp. 612–615.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Barbot, N. Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora / N. Barbot, O. Boeffard, A. Delhay // Proc. of the Intern. Conf. on Language Resources and Evaluation (LREC’12). – Istanbul, 2012. – P. 969–974.</mixed-citation><mixed-citation xml:lang="en">Barbot N., Boëffard O., Delhay A. Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora. Proceedings of the International Conference on Language Resources and Evaluation (LREC’12). Istanbul, 2012, pp. 969–974.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Development of syllable-based text to speech synthesis system in Bengali / N. P. Narendra [et. al.]. // Intern. J. of Speech Technology. – 2011. – No. 14(3). – P. 167–181.</mixed-citation><mixed-citation xml:lang="en">Narendra N. P., Rao K. S.,•Ghosh K.,•Vempada R. R.,•Maity S. Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology, 2011, no 14(3), pp. 167–181.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Kayte, S. A review of unit selection speech synthesis / S. Kayte, M. Mundada, C. Kayte // Intern. J. of Advanced Research in Computer Science and Software Engineering. – 2015. – No. 5(10). – P. 475–479.</mixed-citation><mixed-citation xml:lang="en">Kayte S., Mundada M., Kayte C. A review of unit selection speech synthesis. International Journal of Advanced Research in Computer Science and Software Engineering, 2015, no 5(10), pp. 475–479.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Corpus and voices for catalan speech synthesis / A. Bonafonte [et. al.] // Proc. of the Intern. Conf. on Language Resources and Evaluation (LREC’2008), Marrakech, Morocco, 26 May–1 June 2008. – Marrakech, 2008. – P. 3325–3329.</mixed-citation><mixed-citation xml:lang="en">Bonafonte A., Adell J., Esquerra I., Gallego S., Moreno A., Pérez J. Corpus and voices for catalan speech synthesis. Proceedings of the International Conference on Language Resources and Evaluation (LREC’2008), Marrakech, Morocco, 26 May–1 June 2008. Marrakech, 2008, pp. 3325–3329.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Casademont, E. G. Building synthetic voices in the META-NET framework / E. G. Casademont, A. Bonafonte, M. Moreno // Proc. of the 8th Intern. Conf. on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 21–27 May 2012. – Istanbul, 2012. – P. 3322–3326.</mixed-citation><mixed-citation xml:lang="en">Casademont E. G., Bonafonte A., Moreno M. Building synthetic voices in the META-NET framework. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 21–27 May 2012. Istanbul, 2012, pp. 3322–3326.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
