Preview

Informatics

Advanced search

Research on the effectiveness of using ensemble methods of multidimensional text analysis in categorization tasks

https://doi.org/10.37661/1816-0301-2026-23-2-7-20

Abstract

Objectives. The aim of the work is to experimentally investigate the effectiveness of ensemble methods for multidimensional text analysis in document categorization tasks using the example of authorship identification. Particular attention is paid to comparing classical machine learning algorithms, their ensembles, and the developed hybrid quantum-classical model.

Methods. The study uses support vector machines, logistic regression, and random forests, as well as an ensemble of these models and a hybrid model of the author's architecture. The proposed hybrid approach combines syntactic analysis based on the support vector method, semantic analysis using the BERT transformer model, and a quantum variational module. Experiments were conducted on different corpora of English texts with varying number of authors. Quality was assessed using accuracy, completeness, and F1-score metrics.

Results. In a series of experiments with a small number of authors, all models showed high accuracy, with the hybrid model achieving the best results (F1 score up to 82.5%). In experiments with a large number of authors, a regular decrease in quality was observed, but the hybrid model demonstrated better stability, outperforming classical ensembles on all corpora. The most significant increase in accuracy was recorded on a complex corpus of short texts (blogs) with a large number of authors.

Conclusion. The hybrid quantum-classical model developed by the authors has proven its effectiveness for author attribution tasks and can be scaled for a wider range of document categorization tasks, especially in conditions of high feature dimensionality and a large number of classes. The use of the quantum module made it possible to identify complex nonlinear dependencies in the data that are inaccessible to traditional approaches. The results obtained open up prospects for the practical use of the proposed approach in text analysis systems, including the processing of short messages and extensive author databases. Further development of the research is related to expanding the set of features, optimizing the architecture of quantum circuits, and adapting the model for use in various application areas.

About the Authors

I. A. Trukhanovich
Belarusian State University of Informatics and Radioelectronics
Belarus

Ilya А. Trukhanovich, Applicant

st. P. Brovki, 6, Minsk, 220013



A. I. Paramonov
Belarusian State University of Informatics and Radioelectronics
Belarus

Anton I. Paramonov, Cand. Sci. (Eng.), Assoc. Prof., Head of the Department of Information Systems and Technologies of the Institute of Information Technologies

st. P. Brovki, 6, Minsk, 220013



References

1. Paramonov A. I., Trukhanovich I. A. Authorship identification methods in student plagiarism detection. Sistemnyj analiz i prikladnaja informatika [System Analysis and Applied Information Science], 2023, no. 3, pp. 56–59 (In Russ.). https://doi.org/10.21122/2309-4923-2023-3-56-59.

2. Cerezo M., Arrasmith A., Babbush R., Benjamin S. C., Endo S., …, Coles P. J. Variational quantum algorithms. Nature Reviews Physics, 2021, vol. 3, no. 9, pp. 625–644. https://doi.org/10.1038/s42254-021-00348-9.

3. Trukhanovich I., Paramonov A. Intelligent analysis in text authorship identification. Otkrytye semanticheskie tehnologii proektirovanija intellektual'nyh sistem: sbornik nauchnyh trudov [Open Semantic Technologies for Intelligent Systems (OSTIS): Collection of Scientific Papers]. Ed. board: V. V. Golenkov, I. S. Azarov, V. A. Golovko, A. N. Gordey, N. A. Guliakina, …, D. V. Shunkevich. Minsk, Belorusskij gosudarstvennyj universitet informatiki i radiojelektroniki, 2024, vol. 8, pp. 327–332.

4. Paramonov A. I., Trukhanovich I. A. Ensemble methods of multi-aspect texts analysis in document categorization tasks. Informacionnye sistemy i tehnologii: materialy XI Mezhdunarodnogo nauchnogo kongressa po informatike (CSIST-2025), Minsk, 29–31 oktjabrja 2025 goda : v 2 chastjah [Information Systems and Technologies: Proceedings of the 2025 International Scientific Congress on Informatics (CSIST-2025), Minsk, 29–31 October 2025: in 2 parts]. Ed. board: S. V. Ablamejko, V. V. Kazachenok, A. N. Kurbackij, V. V. Krasnoproshin. Minsk, Belorusskij gosudarstvennyj universitet, 2025, pt. 2, pp. 204–211 (In Russ.).

5. Manakhova A. M., Lagutina N. S. Analysis of the impact of the stylometric characteristics of different levels for the verification of authors of the prose. Teorija dannyh i modelirovanie informacionnyh sistem [Modeling and Analysis of Information Systems], 2021, no. 28, no. 3, pp. 260–279 (In Russ.). https://doi.org/ 10.18255/1818-1015-2021-3-260-279.

6. Veretennikov I. S., Kartashev E. A., Tsaregorodtsev A. L. Assessment of the quality of text classification using the machine learning algorithm "Random forest". Izvestija Altajskogo gosudarstvennogo universiteta [Izvestiya of Altai State University], 2017, no. 4(96) (In Russ.). Available at: https://cyberleninka.ru/ article/n/otsenka-kachestva-klassifikatsii-tekstovyh-materialov-s-ispolzovaniem-algoritma-mashinnogo-obucheniyasluchaynyy- les (accessed 20.01.2026).

7. Tatur M., Paramonov A. Open semantic technology as the foundation for new generation intelligent systems: sbornik nauchnyh trudov [Open Semantic Technologies for Intelligent Systems (OSTIS): Collection of Scientific Papers]. Ed. board: V. V. Golenkov, I. S. Azarov, V. A. Golovko, A. N. Gordey, N. A. Guliakina, …, D. V. Shunkevich. Minsk, Belorusskij gosudarstvennyj universitet informatiki i radiojelektroniki, 2023, vol. 7, pp. 61–66.


Review

For citations:


Trukhanovich I.A., Paramonov A.I. Research on the effectiveness of using ensemble methods of multidimensional text analysis in categorization tasks. Informatics. 2026;23(2):7-20. (In Russ.) https://doi.org/10.37661/1816-0301-2026-23-2-7-20

Views: 30

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)