LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES
Abstract
The article focuses on the problems existing in text-to-speech synthesis. Different morphological, lexical and syntactical elements were localized with the help of the Belarusian unit of NooJ program. Those types of errors, which occur in Belarusian texts, were analyzed and corrected. Language model and part of speech tagging model were built. The natural language processing of Belarusian corpus with the help of developed algorithm using machine learning was carried out. The precision of developed models of machine learning has been 80–90 %. The dictionary was enriched with new words for the further using it in the systems of Belarusian speech synthesis.
About the Authors
Yu. S. HetsevichBelarus
I. V. Reentovich
Belarus
References
1. Kennedy, G. An Introduction to Corpus Linguistics / G. Kennedy. – London : Longman, 1998. – 315 p.
2. Belarusian N-corpus [Electronic resource]. – 2015. – Mode of access : http://bnkorpus.info/. – Date of access : 22.06.2017.
3. Барковіч, А.А. Беларускі корпус тэкстаў : інтэрнэт-дыскурс / А.А. Барковіч // Веснік Беларус. дзярж. ун-та. Сер.
4. Філалогія. Журналістыка. Педагогіка. – 2013. – № 2. – С. 26–29. 4. The First One-Million Corpus for the Belarusian NooJ Module / I. Reentovich [et al.] // Automatic Processing of Natural-Language Electronic Texts with NooJ : 9th Intern. Conf. «NooJ 2015». – Springer International Publishing, 2016. – P. 3–15.
5. Холоденко, А.Б. Использование лексических и синтаксических анализаторов в задачах распознавания для естественных языков / А.Б. Холоденко // Интеллектуальные системы. – 1999. – № 1–2. – С. 185–193.
6. Автоматическая обработка текстов на естественном языке и анализ данных / Е.И. Большакова [и др.]. – М. : Изд-во НИУ ВШЭ, 2017. – 269 с.
7. Silberztein, M. NooJ Manual / M. Silberztein [Electronic resource]. – 2003. – Mode of access : www.nooj4nlp.net. – Date of access : 22.06.2017.
8. Hetsevich, Yu. Overview of Belarusian and Russian Dictionaries and Their Adaptation for NooJ / Yu. Hetsevich, S. Hetsevich // Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the NooJ 2011 Intern. Conf. – Newcastle : Cambridge Scholars Publishing, 2012. – P. 29–40.
9. Kriesel, D. A Brief Introduction to Neural Networks / D. Kriesel [Electronic resource]. – 2005. – Mode of access : http://www.dkriesel.com. – Date of access : 22.06.2017.
10. Quinlan, J.R. Simplifying Decision Trees / J.R. Quinlan // Intern. J. of Man-Machine Studies. – 1987. – Vol. 27, no. 3. – Р. 221–234.
11. Cha, S.-H. A Genetic Algorithm for Constructing Compact Binary Decision Trees / S.-H. Cha, C.C. Tappert // J. of Pattern Recognition Research. – 2009. – Vol. 4, no. 1. – Р. 1–13.
12. Генератар парадыгмы слова // Лабараторыя распазнавання і сінтэзу маўлення [Электронны рэсурс]. – 2017. – Рэжым доступу : http://ssrlab.by/5047. – Дата доступу : 13.05.2017.
13. Oliveira, H.G. Towards the Automatic Enrichment of a Thesaurus with Information in Dictionaries / H.G. Oliveira, P. Gomes // Expert Systems. – 2013. – Vol. 30, no. 4. – P. 320–332.
14. The Enrichment of Lexical Resources Through Incremental Parsebanking / V. Rosén [et al.] // Language Resources and Evaluation. – 2016. – Vol. 50, no. 2. – Р. 291–319.
15. Computer Treatment of Slavic and East European Languages / ed. R. Garabik // Third Intern. Seminar, Bratislava, Slovakia, 10–12 Nov. 2005. – Bratislave : VEDA, 2005. – 246 p.
Review
For citations:
Hetsevich Yu.S., Reentovich I.V. LINGUISTIC ANALYSIS FOR THE BELARUSIAN CORPUS WITH THE APPLICATION OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING TECHNIQUES. Informatics. 2017;(4(56)):70-77. (In Russ.)