A model of homographs automatic identification for the Belarusian language
https://doi.org/10.37661/1816-0301-2023-20-4-87-100
Abstract
Objectives. A prototype system for automated removal of homonyms in Belarusian and Russian electronic texts is described. This is due to the urgent problem of automatic text processing at the morphological level, the process of which is complicated by the inflection of the Belarusian language with a diverse and rich system of morphological characteristics of parts of speech.
Methods. The work uses regular homographs identification methods and knowledge-based methods.
Results. Methods and approaches for designing systems for automatic detection of homographs are proposed. An algorithm for identifying homographs on the basis of knowledge-based method has been developed. An effective and fast-acting prototype for their removal in Russian and Belarusian has been implemented.
Conclusion. A working prototype of the homograph search is presented, which is the first resource for removing ambiguity for the Belarusian language in open access.
About the Authors
Yu. S. HetsevichBelarus
Yuras S. Hetsevich, Ph. D. (Eng.), Assoc. Prof., Head of the Speech Synthesis and Recognition Laboratory
st. Surganova, 6, Minsk, 220012
Ya. S. Zianouka
Belarus
Yauheniya S. Zianouka, Junior Researcher
st. Surganova, 6, Minsk, 220012
D. I. Latyshevich
Belarus
David I. Latyshevich, Trainee of Junior Researcher
st. Surganova, 6, Minsk, 220012
A. A. Bakunovich
Belarus
Andrey A. Bakunovich, Junior Researcher
st. Surganova, 6, Minsk, 220012
A. Ya. Drahun
Belarus
Anastasia Ya. Drahun, Junior Researcher
st. Surganova, 6, Minsk, 220012
M. A. Kazlova
Belarus
Margarita A. Kazlova, Trainee of Junior Researcher
st. Surganova, 6, Minsk, 220012
References
1. Agirre E., Edmonds P. (eds.). Word Sense Disambiguation: Algorithms and Applications. Springer, 2007, Series: Text, Speech and Language Technology, vol. 33, 377 p.
2. Shirshikova A. On the problems of homonymy. Al'manakh sovremennoy nauki i obrazovaniya [Almanac of Modern Science and Education], Tambov, Gramota, 2012, no. 2(57), pp. 190–192 (In Russ.).
3. Tian T., Geller J., Chun S. A. Improving web search results for homonyms by suggesting completions from an ontology. Current Trends in Web Engineering: 10th International Conference on Web Engineering, ICWE 2010 Workshops, July 2010, Vienna, Austria, July 2010. Vienna, Austria, 2010, pp. 41–44.
4. Van den Beukel S., Aroyo L. Homonym detection for humor recognition in short text. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, 31 October 2018. Brussels, Belgium, 2018, pp. 286–291.
5. Pozdniakov K., Segerer G. Regular homophones: a tool for semantic typology and for linguistic reconstruction. Africana Linguistica, 2019, vol. 25, pp. 231–279.
6. Roll U., Correia R. A., Berger-Tal O. Using machine learning to disentangle homonyms in large text corpora. Conservation Biology, June 2018, vol. 32, iss. 3, pp. 716–724.
7. Rysakov S. V., Klyshinsky E. S. Statistical methods of homonymy removal. Novye informacionnye tehnologii v avtomatizirovannyh sistemah [New Information Technologies in Automated Systems], 2015, no. 18, pp. 555–563 (In Russ.).
8. Navigli R., Velardi P. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, July 2005, vol. 27, no. 7, pp. 1075–1086.
9. Gataullin R. R. Analytical review of methods for resolving morphological polysemy. Elektronnyye biblioteki [Electronic Libraries], 2016, vol. 19, no. 2, pp. 98–114 (In Russ).
10. Zelenkov Yu. G., Segalovich I. V., Titov V. A. Probabilistic model for removing morphological homonymy based on normalizing substitutions and positions of neighboring words. Komp'juternaja lingvistika i intellektual'nye tehnologii : trudy Mezhdunarodnoj konferencii «Dialog-2005», Zvenigorod, 1–6 ijunja 2005 g. [Computer linguistics and intellectual technologies: proceedings of the international conference "Dialogue-2005", Zvenigorod, 1–6 June 2005], Moscow, Nauka, 2005, pp. 616–638 (In Russ.).
11. Mukhamedshin D. R., Suleymanov D. Sh. Module of morphological ambiguity resolution: database architecture and organization. Programmnyye produkty i sistemy [Software Products and Systems], 2020, vol. 33, no. 1, pp. 38–46 (In Russ.).
12. Porokhnin A. A. Analysis of statistical methods for removing homonymy in Russian texts. Vestnik Astrakhanskogo gosudarstvennogo tekhnicheskogo universiteta. Serija: Upravlenie, vychislitel'naja tehnika i informatika [Bulletin of the Astrakhan State Technical University. Series: Management, Computing and Information Science], 2013, no. 2, pp. 168–174.
13. Les'ko O. N., Rogushina Yu. V. Using the domain ontology for removing homonymy in natural language texts. Problemi programuvannya [Programming Problems], 2017, no. 2, pp. 61–71 (In Russ.).
14. Zin'kina Yu. V., Pyatkin N. V., Nevzorova O. A. Resolution of functional homonymy in Russian based on contextual rules. Komp'juternaja lingvistika i intellektual'nye tehnologii : trudy Mezhdunarodnoj konferencii «Dialog-2005», Zvenigorod, 1–6 ijunja 2005 g. [Computer linguistics and intellectual technologies: proceedings of the international conference "Dialogue-2005", Zvenigorod, 1–6 June 2005], Moscow, Nauka, 2005, pp. 198–202 (In Russ.).
15. Okrut T., Lobanov B., Yakubovich Y. Context-sensitive homograph disambiguation with NooJ in Belarusian and Russian electronic texts. International Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts “NooJ’2015”, Minsk, Belarus, 11–13 June 2015. UIIP NASB, 2015, p. 48.
16. Hiecevic Ju., Kacan Ya, Lysy S., Stanislavienka H., Hiuntar A. Computer-linguistic services www.corpus.by for automatic text processing. Nacyjanalna-kulturny kampanient u litaraturnaj i dyjaliektnaj movie : zbornіk navukovyh artykulaў [National-cultural Component in Literary and Dialect Language : Collection of Scientific Articles], Brest, Brjesckі dzjarzhaўny ўnіversіtjet іmja A. S. Pushkіna, 2016, pp. 93–104 (In Bel).
17. Zianouka Ya., Hetsevish Yu., Majeŭski S., Dzienisiuk Dz. The problem of automatic search and determonation of homonyms for the Belarusian and Russian languages. Informacionnye tehnologii v promyshlennosti, logistike i socialnoj sfere [Information Technologies in Industry, Logistics and Social Sphere], Minsk, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus, 2021, pp. 182–184.
18. Novy Zapavet – Knіga Prypoves'cjaў. The New Testament – The Book of Proverbs. Transl. A. Bokunа. Minsk, Pazіtyў-cjentr, 2016, 511 p. (In Bel).
Review
For citations:
Hetsevich Yu.S., Zianouka Ya.S., Latyshevich D.I., Bakunovich A.A., Drahun A.Ya., Kazlova M.A. A model of homographs automatic identification for the Belarusian language. Informatics. 2023;20(4):87-100. (In Bel.) https://doi.org/10.37661/1816-0301-2023-20-4-87-100