Preview

Informatics

Advanced search

A peer-reviewed scientific journal “Informatics” has been  published four times a yearsince 2004. The journal is included in the list of scientific editions publishing the results of thesis research of the Higher Attestation Commission of the Republic of Belarus. It is also included in the Science Index scientometric database.  Since December 2017 it has been included in the database of Russian Science Citation Index.

The activity of the "Informatics" journal aims to develop  international scientific cooperation in the field of information technologies.

The target audience is local and foreign authors, information technology specialists and young scientists.

The journal publishes original and review articles  on the results of fundamental and applied research of academic and university specialists in the field of computer science and information technologies. The main goal of the journal is to publish the most significant new results in this field .

Articles,  presenting the final results of scientific projects and thesis research, opening new areas of research at the intersection of computer science and other sciences, are welcome.

All materials submitted to the editorial office of the journal are reviewed. Articles are published in Russian, Belarusian and English.

Current issue

Vol 22, No 3 (2025)
View or download the full issue PDF (Russian)

SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION

7-24 240
Abstract

O b j e c t i v e s. The purpose of the work is automatic detection of lung lesions: cavities, infiltrates, and nodules on chest X-ray images. Also, the possibility of spatial localization of these lesions on the image is investigated.

M e t h o d s. Binary classification using deep convolutional neural networks and the Grad-CAM method are used. Re s u lt s. For the Xception model, the binary classification accuracy on the test dataset is 73.1% for cavities, 71.9% for infiltrates, and 72.8% for nodules. Heat maps with true positive outcomes for cavities and nodules are mostly understandable to radiologists. More research is needed to get heat maps for infiltrates that are understandable to experts.

Co n c l u s i o n. The average classification accuracy of the Xception model for three lesion types (cavities, infiltrates, and nodules) is equal to 72.6%. Heat maps associated with pathological processes in the lungs and lesion localization were constructed. Obtained results are good, but not excellent. Thus, further investigation should be done to improve the classification accuracy and quality of the heat maps.

25-34 236
Abstract

O b j e c t i v e s. The aim of the work is to develop the architecture of an information system for transcription and translation of speech, implement its blocks and test their operation.

M e t h o d s. The existing methods of speech recognition are considered; a comparative analysis of speech recognition and text translation models is carried out. The speech transcription process includes several successive stages: collection and preliminary processing of the audio signal, extraction of acoustic features, direct speech recognition, post-processing and text correction, and output of the result. At the stage of audio signal pre-processing, a combination of specialized libraries is used to prepare data for subsequent analysis. To normalize the recording parameters, the Librosa library is used, which allows resampling the signal to a standard frequency of 16 kHz and converting it to a monophonic format. To suppress background noise and highlight the speech component, the Demucs neural network model is used. The spectral subtraction algorithm additionally corrects residual noise. Speech activity segmentation (VAD) is performed using an energy detector from WebRTC, automatically highlighting speech fragments and removing pauses. The whisper-turbo (OpenAI) model was chosen to implement the speech recognition system due to the higher data processing speed, which allows implementing the streaming mode of the system, and lower requirements for the computing power of the machine. The translation module of the developed intelligent system is built on the T5-large-1024 (Text-to-Text Transfer Transformer) model, adapted for multilingual tasks.

R e s u l t s. A method for creating an intelligent speech recognition system is proposed - a modular architecture of the speech recognition and translation system, a prototype is implemented and metrics are measured. The system showed the following results: for Russian-English translation Cosine Similarity 0.6951, WER 0.529, BLEU Score 0.239; for cascade Russian-Chinese translation through English Cosine Similarity 0.557, WER 0.748, BLEU Score 0.095. Research has shown that the use of cascade translation through English improves the quality of the final text by 32% according to the Cosine Similarity metric and by 25% according to BLEU Score compared to direct translation. The results of the implemented prototype were satisfactory.

C o n c l u s i o n. The proposed implementation of the speech recognition system can solve the task with quality satisfactory for the described problem without risks of unauthorized access to data, since it works without an Internet connection. When using cascade translation through English, the quality of Russian-Chinese translation improves by 32% according to the Cosine Similarity metric (from 0.423 to 0.557) and by 25% according to BLEU Score (from 0.076 to 0.095). The proposed information system can be implemented in the educational process regardless of the academic discipline, and also used at exhibitions, conferences, and international forums. Parallel translation into different languages is possible, which will allow all participants of international forums to actively participate in its events.

35-44 222
Abstract

O b j e c t i v e s. The objectives of the study are to collect data, develop an algorithm for automatic extraction of microexpressions from video recordings, and form rules for combinations of motor units, based on which basic human emotions are determined.

M e t h o d s. Human facial microexpressions are brief, involuntary reactions that may appear when a person attempts to hide their true emotions. Microexpressions play a key role in lie detection and are an important indicator of the concealment of truthful information. In this article, Action Units (movement units) obtained using the py-feat library from the Facial Action Coding System (FACS) were used to analyse facial expressions.

R e s u l t s. A dataset consisting of video recordings of a group of specific people was collected. Rules were developed based on combinations of action units and their intensities to determine basic emotions. An algorithm for determining and extracting microexpressions from video recordings was also formulated. The results of the algorithm study showed a negative correlation between the emotion of joy and the manifestation of lying.

C o n c l u s i o n. The results obtained allow us to expand the information base for neural network lie detection using a video series of facial images by detecting and analysing microexpressions on them.

BIOINFORMATICS

45-58 182
Abstract

O b j e c t i v e s. The algorithm for selection of reference microRNA taking into account their biological features for classification of pathologies.

Development of an algorithm for selecting microRNAs with regard to their interconnection for samples classification in the various biological processes.

M e t h o d s. Methods of linear algebra, principal component analysis, statistical binary regression models, and model performance metrics were used.

R e s u l t s. A new algorithm, MDSeek, has been developed that proposes a selection of reference microRNA for the normalization quantitative polymerase chain reaction results taking into account their coexpression. MDSeek demonstrates higher performance metrics compared to known reference gene selection approaches for the subsequent classification tasks.

C o n c l u s i o n. An original MDSeek algorithm for selecting reference microRNAs for normalization results of polymerase chain reaction is suggested. It takes into account changes in microRNA expression when comparing different biological processes. After applying MDSeek to an experimental set of samples, the normalized data were used for classification tasks, and the performance metrics were better than those of other normalization algorithms.

MATHEMATICAL MODELING

59-71 172
Abstract

O b j e c t i v e s. Construction of an analytical solution to the problem of shielding a low-frequency magnetic field by two thin non-intersecting spherical screens located on the surface of a sphere. Calculation of the shielding coefficient of the initial magnetic field by spherical screens.

M e t h o d s. The method of addition theorems and the method of triple summation equations are used to solve the boundary value problem. The potential of the initial magnetic field is represented as spherical harmonic functions. The secondary potentials of the magnetic field are represented as a superposition of spherical harmonic functions in a local coordinate system in three-dimensional space.

Re s u l t s. The solution of the boundary value problem is reduced to the solution of a system of Fredholm integral equations of the second kind with respect to specially introduced functions. The influence of the geometric parameters of the problem on the value of the screening coefficient is numerically investigated. The results of the calculations are presented in the form of graphs.

Co n c l u s i o n. The proposed methodology and the developed software can find practical application in the development and design of screens in various fields of technology.

INFORMATION PROTECTION AND SYSTEM RELIABILITY

72-82 186
Abstract

O b j e c t i v e s. The article examines the features of using two-layer artificial neural network in problems of approximating binary functions of many binary variables. The issues of choosing the initial values of the model weights and choosing the number of neurons on the hidden layer are studied.

M e t h o d s. The problem of approximating a binary function using an artificial neural network is reduced to the geometric problem of dividing the vertices of a multidimensional cube by hyperplanes. Combinatorial methods are used to prove lemmas on ways of dividing a hypercube by a hyperplane and to construct a lower estimate for the number of binary functions that can be approximated using one neuron on the hidden layer.

R e s u l t s. The features of setting the initial values of weights of an artificial neural network are considered. A lower bound is constructed for the number of binary functions that can be approximated using an artificial neural network with one neuron on the hidden layer. The algorithmic complexity of calculating such an estimate is found. Numerical results are presented for using two-layer artificial neural networks to approximate binary functions in information security problems.

C o n c l u s i o n. The results of the article allow choosing the parameters of an artificial neural network to improve the accuracy of approximation of binary functions of many variables.

83-94 203
Abstract

O b j e c t i v e s. Phishing web resources are among the most common tools of online fraud aimed at obtaining users' confidential information. The goal of this research was to develop a software module for the automatic detection of phishing websites using machine learning methods.

M e t h o d s. To achieve this goal, an analysis of existing datasets containing phishing website URLs was conducted, along with the study of datasets for natural language processing (NLP). This enabled the identification of key features characteristic of fraudulent resources. Two datasets were created (sizes: 18.9 MB and 1.08 GB), incorporating URL attributes and web page content, using a custom-developed parser. Machine learning algorithms such as SVM, Random Forest, Logistic Regression, and Multilayer Perceptron (MLP) were applied for website classification. The potential of the TinyBERT language model for analyzing textual content was also explored.

R e s u l t s. The analysis revealed that the MLP model demonstrated the best performance for URL classification, while the TinyBERT model excelled in analyzing textual content. A software module was developed, consisting of a server-side application and a browser extension. The extension collects data from web resources, transmits them to the server, where trained machine learning models analyze the information. The server calculates the likelihood of phishing activity, and the results are displayed to the user via the extension's interface. The implementation utilized a technology stack including Python 3.12, Flask, Pickle, Langdetect, Re, NLTK, JavaScript, and the Google Chrome API.

Co n c l u s i o n. The developed software module was tested and demonstrated high efficiency in phishing website classification tasks. The theoretical significance of the work lies in applying modern machine learning algorithms for analyzing textual content and URLs. The practical significance is reflected in the creation of a ready-to-use solution for real-time phishing site detection.

SCIENTISTS OF BELARUS

INFORMATION



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.