<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">inform</journal-id><journal-title-group><journal-title xml:lang="ru">Информатика</journal-title><trans-title-group xml:lang="en"><trans-title>Informatics</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1816-0301</issn><issn pub-type="epub">2617-6963</issn><publisher><publisher-name>UIIP NASB</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">inform-241</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION</subject></subj-group></article-categories><title-group><article-title>ЛІНГВІСТЫЧНЫ АНАЛІЗ ДЛЯ БЕЛАРУСКАГА КОРПУСА ТЭКСТАЎ З ПРЫМЯНЕННЕМ МЕТАДАЎ АПРАЦОЎКІ НАТУРАЛЬНАЙ МОВЫ І МАШЫННАГА НАВУЧАННЯ</article-title><trans-title-group xml:lang="en"><trans-title>LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Гецэвіч</surname><given-names>Ю. С.</given-names></name><name name-style="western" xml:lang="en"><surname>Hetsevich</surname><given-names>Yu. S.</given-names></name></name-alternatives><bio xml:lang="ru"/><email xlink:type="simple">Yury.Hetsevich@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Рэентовіч</surname><given-names>I. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Reentovich</surname><given-names>I. V.</given-names></name></name-alternatives><bio xml:lang="ru"/><email xlink:type="simple">ivan.reentovich@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Аб`яднаны iнстытут праблем інфарматыкі НАН Беларусі</institution></aff><aff xml:lang="en"><institution>United Institute of Informatics Problems, National Academy of Sciences of Belarus</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2017</year></pub-date><pub-date pub-type="epub"><day>12</day><month>12</month><year>2017</year></pub-date><volume>0</volume><issue>4(56)</issue><fpage>70</fpage><lpage>77</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Гецэвіч Ю.С., Рэентовіч I.В., 2017</copyright-statement><copyright-year>2017</copyright-year><copyright-holder xml:lang="ru">Гецэвіч Ю.С., Рэентовіч I.В.</copyright-holder><copyright-holder xml:lang="en">Hetsevich Y.S., Reentovich I.V.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://inf.grid.by/jour/article/view/241">https://inf.grid.by/jour/article/view/241</self-uri><abstract><p> Аналізуюцца праблемы лакалізацыі розных марфалагічных, лексічных і сінтаксічных элементаў з дапамогай беларускага модуля праграмы NooJ. У тым ліку выпраўляюцца памылкі, якія сустракаюцца ў беларускіх тэкстах, будуюцца мадэлі мовы і тэгіравання часцін мовы. Праводзіцца апрацоўка беларускага корпуса тэкстаў на натуральнай мове з дапамогай распрацаванага алгарытму з выкарыстаннем машыннага навучання.</p></abstract><trans-abstract xml:lang="en"><p>The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.</p></trans-abstract></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Kennedy, G. An Introduction to Corpus Linguistics / G. Kennedy. – London : Longman, 1998. – 315 p.</mixed-citation><mixed-citation xml:lang="en">Kennedy, G. An Introduction to Corpus Linguistics / G. Kennedy. – London : Longman, 1998. – 315 p.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Belarusian N-corpus [Electronic resource]. – 2015. – Mode of access : http://bnkorpus.info/. – Date of access : 22.06.2017.</mixed-citation><mixed-citation xml:lang="en">Belarusian N-corpus [Electronic resource]. – 2015. – Mode of access : http://bnkorpus.info/. – Date of access : 22.06.2017.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Барковіч, А.А. Беларускі корпус тэкстаў : інтэрнэт-дыскурс / А.А. Барковіч // Веснік Беларус. дзярж. ун-та. Сер.</mixed-citation><mixed-citation xml:lang="en">Барковіч, А.А. Беларускі корпус тэкстаў : інтэрнэт-дыскурс / А.А. Барковіч // Веснік Беларус. дзярж. ун-та. Сер.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Філалогія. Журналістыка. Педагогіка. – 2013. – № 2. – С. 26–29. 4. The First One-Million Corpus for the Belarusian NooJ Module / I. Reentovich [et al.] // Automatic Processing of Natural-Language Electronic Texts with NooJ : 9th Intern. Conf. «NooJ 2015». – Springer International Publishing, 2016. – P. 3–15.</mixed-citation><mixed-citation xml:lang="en">Філалогія. Журналістыка. Педагогіка. – 2013. – № 2. – С. 26–29. 4. The First One-Million Corpus for the Belarusian NooJ Module / I. Reentovich [et al.] // Automatic Processing of Natural-Language Electronic Texts with NooJ : 9th Intern. Conf. «NooJ 2015». – Springer International Publishing, 2016. – P. 3–15.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Холоденко, А.Б. Использование лексических и синтаксических анализаторов в задачах распознавания для естественных языков / А.Б. Холоденко // Интеллектуальные системы. – 1999. – № 1–2. – С. 185–193.</mixed-citation><mixed-citation xml:lang="en">Холоденко, А.Б. Использование лексических и синтаксических анализаторов в задачах распознавания для естественных языков / А.Б. Холоденко // Интеллектуальные системы. – 1999. – № 1–2. – С. 185–193.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Автоматическая обработка текстов на естественном языке и анализ данных / Е.И. Большакова [и др.]. – М. : Изд-во НИУ ВШЭ, 2017. – 269 с.</mixed-citation><mixed-citation xml:lang="en">Автоматическая обработка текстов на естественном языке и анализ данных / Е.И. Большакова [и др.]. – М. : Изд-во НИУ ВШЭ, 2017. – 269 с.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Silberztein, M. NooJ Manual / M. Silberztein [Electronic resource]. – 2003. – Mode of access : www.nooj4nlp.net. – Date of access : 22.06.2017.</mixed-citation><mixed-citation xml:lang="en">Silberztein, M. NooJ Manual / M. Silberztein [Electronic resource]. – 2003. – Mode of access : www.nooj4nlp.net. – Date of access : 22.06.2017.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Hetsevich, Yu. Overview of Belarusian and Russian Dictionaries and Their Adaptation for NooJ / Yu. Hetsevich, S. Hetsevich // Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the NooJ 2011 Intern. Conf. – Newcastle : Cambridge Scholars Publishing, 2012. – P. 29–40.</mixed-citation><mixed-citation xml:lang="en">Hetsevich, Yu. Overview of Belarusian and Russian Dictionaries and Their Adaptation for NooJ / Yu. Hetsevich, S. Hetsevich // Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the NooJ 2011 Intern. Conf. – Newcastle : Cambridge Scholars Publishing, 2012. – P. 29–40.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Kriesel, D. A Brief Introduction to Neural Networks / D. Kriesel [Electronic resource]. – 2005. – Mode of access : http://www.dkriesel.com. – Date of access : 22.06.2017.</mixed-citation><mixed-citation xml:lang="en">Kriesel, D. A Brief Introduction to Neural Networks / D. Kriesel [Electronic resource]. – 2005. – Mode of access : http://www.dkriesel.com. – Date of access : 22.06.2017.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Quinlan, J.R. Simplifying Decision Trees / J.R. Quinlan // Intern. J. of Man-Machine Studies. – 1987. – Vol. 27, no. 3. – Р. 221–234.</mixed-citation><mixed-citation xml:lang="en">Quinlan, J.R. Simplifying Decision Trees / J.R. Quinlan // Intern. J. of Man-Machine Studies. – 1987. – Vol. 27, no. 3. – Р. 221–234.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Cha, S.-H. A Genetic Algorithm for Constructing Compact Binary Decision Trees / S.-H. Cha, C.C. Tappert // J. of Pattern Recognition Research. – 2009. – Vol. 4, no. 1. – Р. 1–13.</mixed-citation><mixed-citation xml:lang="en">Cha, S.-H. A Genetic Algorithm for Constructing Compact Binary Decision Trees / S.-H. Cha, C.C. Tappert // J. of Pattern Recognition Research. – 2009. – Vol. 4, no. 1. – Р. 1–13.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Генератар парадыгмы слова // Лабараторыя распазнавання і сінтэзу маўлення [Электронны рэсурс]. – 2017. – Рэжым доступу : http://ssrlab.by/5047. – Дата доступу : 13.05.2017.</mixed-citation><mixed-citation xml:lang="en">Генератар парадыгмы слова // Лабараторыя распазнавання і сінтэзу маўлення [Электронны рэсурс]. – 2017. – Рэжым доступу : http://ssrlab.by/5047. – Дата доступу : 13.05.2017.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Oliveira, H.G. Towards the Automatic Enrichment of a Thesaurus with Information in Dictionaries / H.G. Oliveira, P. Gomes // Expert Systems. – 2013. – Vol. 30, no. 4. – P. 320–332.</mixed-citation><mixed-citation xml:lang="en">Oliveira, H.G. Towards the Automatic Enrichment of a Thesaurus with Information in Dictionaries / H.G. Oliveira, P. Gomes // Expert Systems. – 2013. – Vol. 30, no. 4. – P. 320–332.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">The Enrichment of Lexical Resources Through Incremental Parsebanking / V. Rosén [et al.] // Language Resources and Evaluation. – 2016. – Vol. 50, no. 2. – Р. 291–319.</mixed-citation><mixed-citation xml:lang="en">The Enrichment of Lexical Resources Through Incremental Parsebanking / V. Rosén [et al.] // Language Resources and Evaluation. – 2016. – Vol. 50, no. 2. – Р. 291–319.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Computer Treatment of Slavic and East European Languages / ed. R. Garabik // Third Intern. Seminar, Bratislava, Slovakia, 10–12 Nov. 2005. – Bratislave : VEDA, 2005. – 246 p.</mixed-citation><mixed-citation xml:lang="en">Computer Treatment of Slavic and East European Languages / ed. R. Garabik // Third Intern. Seminar, Bratislava, Slovakia, 10–12 Nov. 2005. – Bratislave : VEDA, 2005. – 246 p.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
