Tonal spaces of vector language models
https://doi.org/10.37661/1816-0301-2026-23-1-88-104
Abstract
Objectives. Tonality as a positive-negative mood is a key parameter of text analysis and generation algorithms. Existing machine learning methods encode tonality in computationally extensive and uninterpretable ways which hampers the development of the corresponding applications. The work aims to solve this problem for Russian language.
Methods. Pre-trained vector language models are used to encode words as vectors in multidimensional spaces. In these spaces, the tonality corresponds to a specific direction which optimally discriminates the positive and negative prototypes. The tonality of word is then determined by its projection onto this direction. Adding a tonal vector to a key word defines a one-dimensional subspace, containing its positive and negative associations.
Results. The algorithm is tested on GloVe and FastText machine language models, encoding individual Russian words and morphemes with vectors in 300-dimensional space. Commonly used verbs and nouns served as key words. The average reliability of the found tonal associations estimates as 80 %.
Conclusion. The results indicate the applicability of pre-trained vector language models for fast and interpretable working with tonal information. The developed approach is applicable for the tasks of aspect-based sentiment analysis, as well as for the machine generation of object-oriented texts with a required tonality. Generalization of the tonal axis to the triple of Osgood's semantic factors allows expanding the method to a full range of affectively-semantic information.
Keywords
About the Authors
Kirill M. ChernikovRussian Federation
Kirill M. Chernikov, Undergraduate of Department of Artificial Intelligence Technologies
av. Kronverkskiy, 49A, Saint Petersburg, 197101
Ilya A. Surov
Russian Federation
Ilya A. Surov, Cand. Sci. (Phys.-Math.), Assoc. Prof., Senior Researcher of Department of Artificial Intelligence Technologies
av. Kronverkskiy, 49A, Saint Petersburg, 197101
References
1. Clynes M. The communication of emotion: Theory of sentics. Theories of Emotion. In R. Plutchik, H. Kellerman (eds.). Academic Press, 1980, рр. 271–301. https://doi.org/10.1016/B978-0-12-558701-3.50017-X.
2. Smetanin S. The applications of sentiment analysis for Russian language texts: Current challenges and future perspectives. IEEE Access, 2020, vol. 8, рр. 110693–110719. https://doi.org/10.1109/access.2020.3002215.
3. Brauwers G., Frasincar F. A survey on aspect-based sentiment classification. ACM Computing Surveys, 2021, vol. 55, рр. 1–37. https://doi.org/10.1145/3503044.
4. Fan S., Yao J., Sun Y., Zhan Y. A summary of aspect-based sentiment analysis. Journal of Physics: Conference Series, 2020, vol. 1624, no. 2. Available at: https://iopscience.iop.org/article/10.1088/1742-6596/1624/2/022051 (accessed 07.12.2025). https://doi.org/10.1088/1742-6596/1624/2/022051.
5. Tang D., Qin B., Liu T. Deep learning for sentiment analysis: successful approaches and future challenges. WIREs Data Mining and Knowledge Discovery, 2015, vol. 5, no. 6, pp. 292–303. https://doi.org/10.1002/widm.1171.
6. van der Sluis I., Mellish C., Doherty G. Affective Text: Generation Strategies and Emotion Measurement Issues. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, Palm Beach, Florida, USA, 18–20 May 2011. Palm Beach, 2011, рр. 123–128.
7. Singh I., Barkati A., Goswamy T., Modi A. Adapting a language model for controlled affective text generation. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), December 2020. Barcelona, 2020, рр. 2787–2801. https://doi.org/10.48550/arXiv.2011.04000.
8. Nie G., Zhan Y. A review of affective generation models, 2022. Available at: https://arxiv.org/pdf/2202.10763 (accessed 07.12.2025). https://doi.org/10.48550/arXiv.2202.10763.
9. Storey V. C., Lukyanenko R., Maass W., Parsons J. Explainable AI. Communications of the ACM, 2022, vol. 65, no. 4, рр. 27–29. https://doi.org/10.1145/3490699.
10. Tanaka Y., Osgood C. E. Cross-culture, cross-concept, and cross-subject generality of affective meaning systems. Journal of Personality and Social Psychology, 1965, vol. 2, no. 2, рр. 143–153. https://doi.org/10.1037/h0022392.
11. Osgood C. E. On the whys and wherefores of E, P, and A. Journal of Personality and Social Psychology, 1969, vol. 12, no. 3, рр. 194–199. https://doi.org/10.1037/h0027715.
12. Surov I. A. Opening the black box: finding Osgood's semantic factors in word2vec space. Informatika i avtomatizacija [Informatics and Automation], 2022, vol. 21, no. 5, рр. 916–936 (In Russ.). https://doi.org/10.15622/ia.21.5.3.
13. Gruzdeva A. S., Surov I. A. Machine-semantic differential: emotion mapping through vector language models. Uchenye zapiski Instituta psihologii Rossijskoj akademii nauk [Proceedings of the Institute of Psychology of the Russian Academy of Sciences], 2025, vol. 5, no. 4, рр. 86–99 (In Russ.). https://doi.org/10.38098/proceedings_2025_05_04_10.
14. Surov I. A. Process-semantic analysis of words and texts. Artificial Intelligence in Models, Methods and Applications. In O. Dolinina, I. Bessmertny, A. Brovko, V. Kreinovich, V. Pechenkin, …, V. Zhmud (eds.). Cham, Springer, 2023, рр. 247–260. https://doi.org/10.1007/978-3-031-22938-1_17.
15. Mikolov T., Yih W., Zweig G. Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, 9–14 June 2013. Atlanta, 2013, рр. 746–751.
16. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality. NIPS’13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 5–10 December 2013. Lake Tahoe, 2013, vol. 2, рр. 3111–3119.
17. McLachlan G. J. Discriminant Analysis and Statistical Pattern Recognition. New York, Wiley, 2004, 526 р.
18. Pennington J., Socher R., Manning C. D. Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. Doha, 2014, рр. 1532–1543.
19. Kukushkin A. Navec_hudlit_v1_12B_500K_300d_100q.tar, 2023. Available at: https://github.com/natasha/navec (accessed 07.12.2025).
20. Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, vol. 5, рр. 135–146. https://doi.org/10.48550/ARXIV.1607.04606.
21. Koltsova O. Y., Alexeeva S. V., Kolcov S. N. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Computational Linguistics and Intellectual Technologies, 2016, vol. 2016, рр. 277–287.
22. Ma H., Yu J., Wang F., Cao H., Xia R. From extraction to generation: Multimodal emotion-cause pair generation in conversations. IEEE Transactions on Affective Computing, 2025, vol. 16, no. 2, рр. 586–597. https://doi.org/10.1109/TAFFC.2024.3446646.
23. Singh G., Brachma D., Rai P., Modi A. Text-based fine-grained emotion prediction. IEEE Transactions on Affective Computing, 2024, vol. 15, no. 2, рр. 405–416. https://doi.org/10.1109/TAFFC.2023.3298405.
24. Surov I. A. Quantum core affect. Color-emotion structure of semantic atom. Frontiers in Psychology, 2022, vol. 13. https://doi.org/10.3389/fpsyg.2022.838029.
25. Widdows D., Howell K., Cohen T. Should semantic vector composition be explicit? Can it be linear? Proceedings of the Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science, Groningen, The Netherlands, 14–18 June 2021. Groningen, 2021, рр. 76–86. https://doi.org/10.48550/arXiv.2104.06555.
26. Schlegel K., Neubert P., Protzel P. A comparison of vector symbolic architectures. Artificial Intelligence Review, 2022, vol. 55, no. 6, рр. 4523–4555. https://doi.org/10.1007/s10462-021-10110-3.
27. Kleyko D., Davies M., Frady E. P., Kanerva P., Kent S. J., Olshausen B. A. Vector symbolic architectures as a computing framework for emerging hardware. Proceedings of the IEEE, 2022, vol. 110, no. 10, рр. 1538–1571. https://doi.org/10.1109/JPROC.2022.3209104.
28. Surov I. A. Geometrical semiotics of Russian cases. Vestnik Moskovskogo gosudarstvennogo universiteta. Gumanitarnye nauki [Vestnik of Moscow State Linguistic University. Humanities], 2026. (In press). (In Russ.).
29. Arnulf J. K., Larsen K. R., Martinsen Ø. L., Bong C. H. Predicting survey responses: How and why semantics shape survey statistics on Organizational Behaviour. PLoS ONE, 2014, vol. 9, no. 9. https://doi.org/10.1371/journal.pone.0106361.
30. Arnulf J. K., Olsson U. H., Nimon K. Measuring the menu, not the food: “psychometric” data may instead measure “lingometrics” (and miss its greatest potential). Frontiers in Psychology, 2024, vol. 15, р. 1308098. https://doi.org/10.3389/fpsyg.2024.1308098.
31. Lukyanenko R., Larsen K. R. Integrating LLMs and Psychometrics: Global Construct Validity. Forty-Fifth International Conference on Information Systems, Bangkok, Thailand, 15–18 December 2024. Bangkok, 2024. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5341306 (accessed 07.12.2025).
Review
For citations:
Chernikov K.M., Surov I.A. Tonal spaces of vector language models. Informatics. 2026;23(1):88-104. (In Russ.) https://doi.org/10.37661/1816-0301-2026-23-1-88-104
JATS XML

















