Informatics
Informatika

ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)

eng | рус

Preview

Informatics

Advanced search

Archives

A model of homographs automatic identification for the Belarusian language

Yu. S. Hetsevich, Ya. S. Zianouka, D. I. Latyshevich, A. A. Bakunovich, A. Ya. Drahun, M. A. Kazlova

https://doi.org/10.37661/1816-0301-2023-20-4-87-100

Full Text:

PDF (Bel)

Generate QR code

Abstract

Objectives. A prototype system for automated removal of homonyms in Belarusian and Russian electronic texts is described. This is due to the urgent problem of automatic text processing at the morphological level, the process of which is complicated by the inflection of the Belarusian language with a diverse and rich system of morphological characteristics of parts of speech.

Methods. The work uses regular homographs identification methods and knowledge-based methods.

Results. Methods and approaches for designing systems for automatic detection of homographs are proposed. An algorithm for identifying homographs on the basis of knowledge-based method has been developed. An effective and fast-acting prototype for their removal in Russian and Belarusian has been implemented.

Conclusion. A working prototype of the homograph search is presented, which is the first resource for removing ambiguity for the Belarusian language in open access.

Keywords

homonymy, removal of homonyms, ambiguity, automatic processing of electronic texts, the Belarusian language, dictionary

About the Authors

Yu. S. Hetsevich

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Yuras S. Hetsevich, Ph. D. (Eng.), Assoc. Prof., Head of the Speech Synthesis and Recognition Laboratory

st. Surganova, 6, Minsk, 220012

Ya. S. Zianouka

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Yauheniya S. Zianouka, Junior Researcher

st. Surganova, 6, Minsk, 220012

D. I. Latyshevich

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

David I. Latyshevich, Trainee of Junior Researcher

st. Surganova, 6, Minsk, 220012

A. A. Bakunovich

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Andrey A. Bakunovich, Junior Researcher

st. Surganova, 6, Minsk, 220012

A. Ya. Drahun

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Anastasia Ya. Drahun, Junior Researcher

st. Surganova, 6, Minsk, 220012

M. A. Kazlova

The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Margarita A. Kazlova, Trainee of Junior Researcher

st. Surganova, 6, Minsk, 220012

References

1. Agirre E., Edmonds P. (eds.). Word Sense Disambiguation: Algorithms and Applications. Springer, 2007, Series: Text, Speech and Language Technology, vol. 33, 377 p.

2. Shirshikova A. On the problems of homonymy. Al'manakh sovremennoy nauki i obrazovaniya [Almanac of Modern Science and Education], Tambov, Gramota, 2012, no. 2(57), pp. 190–192 (In Russ.).

3. Tian T., Geller J., Chun S. A. Improving web search results for homonyms by suggesting completions from an ontology. Current Trends in Web Engineering: 10th International Conference on Web Engineering, ICWE 2010 Workshops, July 2010, Vienna, Austria, July 2010. Vienna, Austria, 2010, pp. 41–44.

4. Van den Beukel S., Aroyo L. Homonym detection for humor recognition in short text. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, 31 October 2018. Brussels, Belgium, 2018, pp. 286–291.

5. Pozdniakov K., Segerer G. Regular homophones: a tool for semantic typology and for linguistic reconstruction. Africana Linguistica, 2019, vol. 25, pp. 231–279.

6. Roll U., Correia R. A., Berger-Tal O. Using machine learning to disentangle homonyms in large text corpora. Conservation Biology, June 2018, vol. 32, iss. 3, pp. 716–724.

7. Rysakov S. V., Klyshinsky E. S. Statistical methods of homonymy removal. Novye informacionnye tehnologii v avtomatizirovannyh sistemah [New Information Technologies in Automated Systems], 2015, no. 18, pp. 555–563 (In Russ.).

8. Navigli R., Velardi P. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, July 2005, vol. 27, no. 7, pp. 1075–1086.

9. Gataullin R. R. Analytical review of methods for resolving morphological polysemy. Elektronnyye biblioteki [Electronic Libraries], 2016, vol. 19, no. 2, pp. 98–114 (In Russ).

10. Zelenkov Yu. G., Segalovich I. V., Titov V. A. Probabilistic model for removing morphological homonymy based on normalizing substitutions and positions of neighboring words. Komp'juternaja lingvistika i intellektual'nye tehnologii : trudy Mezhdunarodnoj konferencii «Dialog-2005», Zvenigorod, 1–6 ijunja 2005 g. [Computer linguistics and intellectual technologies: proceedings of the international conference "Dialogue-2005", Zvenigorod, 1–6 June 2005], Moscow, Nauka, 2005, pp. 616–638 (In Russ.).

11. Mukhamedshin D. R., Suleymanov D. Sh. Module of morphological ambiguity resolution: database architecture and organization. Programmnyye produkty i sistemy [Software Products and Systems], 2020, vol. 33, no. 1, pp. 38–46 (In Russ.).

12. Porokhnin A. A. Analysis of statistical methods for removing homonymy in Russian texts. Vestnik Astrakhanskogo gosudarstvennogo tekhnicheskogo universiteta. Serija: Upravlenie, vychislitel'naja tehnika i informatika [Bulletin of the Astrakhan State Technical University. Series: Management, Computing and Information Science], 2013, no. 2, pp. 168–174.

13. Les'ko O. N., Rogushina Yu. V. Using the domain ontology for removing homonymy in natural language texts. Problemi programuvannya [Programming Problems], 2017, no. 2, pp. 61–71 (In Russ.).

14. Zin'kina Yu. V., Pyatkin N. V., Nevzorova O. A. Resolution of functional homonymy in Russian based on contextual rules. Komp'juternaja lingvistika i intellektual'nye tehnologii : trudy Mezhdunarodnoj konferencii «Dialog-2005», Zvenigorod, 1–6 ijunja 2005 g. [Computer linguistics and intellectual technologies: proceedings of the international conference "Dialogue-2005", Zvenigorod, 1–6 June 2005], Moscow, Nauka, 2005, pp. 198–202 (In Russ.).

15. Okrut T., Lobanov B., Yakubovich Y. Context-sensitive homograph disambiguation with NooJ in Belarusian and Russian electronic texts. International Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts “NooJ’2015”, Minsk, Belarus, 11–13 June 2015. UIIP NASB, 2015, p. 48.

16. Hiecevic Ju., Kacan Ya, Lysy S., Stanislavienka H., Hiuntar A. Computer-linguistic services www.corpus.by for automatic text processing. Nacyjanalna-kulturny kampanient u litaraturnaj i dyjaliektnaj movie : zbornіk navukovyh artykulaў [National-cultural Component in Literary and Dialect Language : Collection of Scientific Articles], Brest, Brjesckі dzjarzhaўny ўnіversіtjet іmja A. S. Pushkіna, 2016, pp. 93–104 (In Bel).

17. Zianouka Ya., Hetsevish Yu., Majeŭski S., Dzienisiuk Dz. The problem of automatic search and determonation of homonyms for the Belarusian and Russian languages. Informacionnye tehnologii v promyshlennosti, logistike i socialnoj sfere [Information Technologies in Industry, Logistics and Social Sphere], Minsk, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus, 2021, pp. 182–184.

18. Novy Zapavet – Knіga Prypoves'cjaў. The New Testament – The Book of Proverbs. Transl. A. Bokunа. Minsk, Pazіtyў-cjentr, 2016, 511 p. (In Bel).

Review

For citations:

Hetsevich Yu.S., Zianouka Ya.S., Latyshevich D.I., Bakunovich A.A., Drahun A.Ya., Kazlova M.A. A model of homographs automatic identification for the Belarusian language. Informatics. 2023;20(4):87-100. (In Bel.) https://doi.org/10.37661/1816-0301-2023-20-4-87-100

Views: 525

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)

* not an advertisement

Indexing Databases

* not an advertisement

Popular articles

Editor-in-Chief

Кругликов С. В.

Article Tools

Finding References

Email this article (Login required)

Email the author (Login required)

About the Authors

Yu. S. Hetsevich
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Yuras S. Hetsevich, Ph. D. (Eng.), Assoc. Prof., Head of the Speech Synthesis and Recognition Laboratory

st. Surganova, 6, Minsk, 220012

Ya. S. Zianouka
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Yauheniya S. Zianouka, Junior Researcher

st. Surganova, 6, Minsk, 220012

D. I. Latyshevich
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

David I. Latyshevich, Trainee of Junior Researcher

st. Surganova, 6, Minsk, 220012

A. A. Bakunovich
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Andrey A. Bakunovich, Junior Researcher

st. Surganova, 6, Minsk, 220012

A. Ya. Drahun
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Anastasia Ya. Drahun, Junior Researcher

st. Surganova, 6, Minsk, 220012

M. A. Kazlova
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus

Margarita A. Kazlova, Trainee of Junior Researcher

st. Surganova, 6, Minsk, 220012

Notifications