Preview

Informatics

Advanced search

Speech transcription and translation system from Russian to Chinese

https://doi.org/10.37661/1816-0301-2025-22-3-25-34

Abstract

O b j e c t i v e s. The aim of the work is to develop the architecture of an information system for transcription and translation of speech, implement its blocks and test their operation.

M e t h o d s. The existing methods of speech recognition are considered; a comparative analysis of speech recognition and text translation models is carried out. The speech transcription process includes several successive stages: collection and preliminary processing of the audio signal, extraction of acoustic features, direct speech recognition, post-processing and text correction, and output of the result. At the stage of audio signal pre-processing, a combination of specialized libraries is used to prepare data for subsequent analysis. To normalize the recording parameters, the Librosa library is used, which allows resampling the signal to a standard frequency of 16 kHz and converting it to a monophonic format. To suppress background noise and highlight the speech component, the Demucs neural network model is used. The spectral subtraction algorithm additionally corrects residual noise. Speech activity segmentation (VAD) is performed using an energy detector from WebRTC, automatically highlighting speech fragments and removing pauses. The whisper-turbo (OpenAI) model was chosen to implement the speech recognition system due to the higher data processing speed, which allows implementing the streaming mode of the system, and lower requirements for the computing power of the machine. The translation module of the developed intelligent system is built on the T5-large-1024 (Text-to-Text Transfer Transformer) model, adapted for multilingual tasks.

R e s u l t s. A method for creating an intelligent speech recognition system is proposed - a modular architecture of the speech recognition and translation system, a prototype is implemented and metrics are measured. The system showed the following results: for Russian-English translation Cosine Similarity 0.6951, WER 0.529, BLEU Score 0.239; for cascade Russian-Chinese translation through English Cosine Similarity 0.557, WER 0.748, BLEU Score 0.095. Research has shown that the use of cascade translation through English improves the quality of the final text by 32% according to the Cosine Similarity metric and by 25% according to BLEU Score compared to direct translation. The results of the implemented prototype were satisfactory.

C o n c l u s i o n. The proposed implementation of the speech recognition system can solve the task with quality satisfactory for the described problem without risks of unauthorized access to data, since it works without an Internet connection. When using cascade translation through English, the quality of Russian-Chinese translation improves by 32% according to the Cosine Similarity metric (from 0.423 to 0.557) and by 25% according to BLEU Score (from 0.076 to 0.095). The proposed information system can be implemented in the educational process regardless of the academic discipline, and also used at exhibitions, conferences, and international forums. Parallel translation into different languages is possible, which will allow all participants of international forums to actively participate in its events.

About the Authors

L. P. Kuzmenkov
Belarusian State University
Belarus

Leonid P. Kuzmenkov - Student, Belarusian State University.

Nezavisimosti av., 4, Minsk, 220030



V. A. Chuyko
Belarusian State University
Belarus

Vladislav A. Chuyko - M. Sc. (Phys.-Math.), Senior Lecturer, Belarusian State University.

Nezavisimosti av., 4, Minsk, 220030



A. I. Kazlova
Belarusian State University
Belarus

Alena I. Kazlova - Ph. D. (Phys.-Math.), Assoc. Prof., Belarusian State University.

Nezavisimosti av., 4, Minsk, 220030



References

1. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., …, Polosukhin I. Attention Is All You Need, 2017. Available at: https://arxiv.org/abs/1706.03762 (accessed 12.05.2025).

2. Papineni K., Roukos S., Ward T., Zhu W.-J. BLEU: a method for automatic evaluation of machine translation. 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Jule 2002, pp. 311–318.

3. Tzoukermann E., Miller C. Evaluating automatic speech recognition in translation. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, Boston, MA, March 2018, vol. 2: MT Users' Track, рр. 294–302.

4. Sperber M., Setiawan H., Gollan C., Nallasamy U., Paulik M. Consistent transcription and translation of speech. Transactions of the Association for Computational Linguistics (TACL), 2020, vol. 8, pp. 695–709.

5. Etchegoyhen T., Arzelus H., Gete H., Alvarez A., Torre I. G., …, Fernandez E. B. Cascade or direct speech translation? A case study. Applied Sciences, 2022, vol. 12, iss. 3, рр. 1097.

6. Radford А., Kim J. W., Xu T., Brockman G., McLeavey C., Sutskever I. Robust Speech Recognition via Large-Scale Weak Supervision, 2022. Available at: https://arxiv.org/abs/2212.04356 (accessed 12.05.2025).

7. Kumar L. A., Renuka D. K., Chakravarthi B. R., Mandl T. Automatic Speech Recognition and Translation for Low Resource Languages. Wiley-Scrivener, 2024, 496 р.


Review

For citations:


Kuzmenkov L.P., Chuyko V.A., Kazlova A.I. Speech transcription and translation system from Russian to Chinese. Informatics. 2025;22(3):25-34. (In Russ.) https://doi.org/10.37661/1816-0301-2025-22-3-25-34

Views: 238


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)