Preview

Informatics

Advanced search

Algorithm development for recognizing human emotions using a convolutional neural network based on audio data

https://doi.org/10.37661/1816-0301-2022-19-4-53-68

Abstract

Objectives. This article provides a description and experience of creating the algorithm for recognizing the emotional state of the subject.
Methods. Image processing methods are used.
Results. The proposed algorithm makes it possible to recognize the emotional states of the subject on the basis of an audio data set. It was possible to improve the accuracy of the algorithm by changing the data set supplied to the input of the neural network.
The stages of training convolutional neural network on a pre-prepared set of audio data are described, and the structure of the algorithm is described. To validate the neural network different set of audio data, not participating in the training, was selected. As a result of the study, graphs were constructed demonstrating the accuracy of the proposed method.
After receiving the initial data of the study, the analysis of the possibilities for improving the algorithm in terms of ergonomics and accuracy of operation was also carried out. The strategy was developed to achieve a better result and obtain a more accurate algorithm. Based on the conclusions presented in the article, the rationale for choosing the representation of the data set and the software package necessary for the implementation of the software part of the algorithm is given.
Conclusion. The proposed algorithm has a high accuracy of operation and does not require large computational costs.

About the Authors

V. V. Semenuk
Donetsk Technical School of Industrial Automation after A. V. Zakharchenko
Ukraine

Viktoriya V. Semenuk, M. Sc. (Eng.), Teacher of Special Disciplines

st. Gorkogo, 163, Donetsk, 83000



M. V. Skladchikov
Donetsk Technical School of Industrial Automation after A. V. Zakharchenko
Ukraine

Maxim V. Skladchikov, M. Sc. (Eng.), Teacher of Special Disciplines

st. Gorkogo, 163, Donetsk, 83000



References

1. Mesaros A., Heittola T., Virtanen T. Acoustic scene classification: Overviews of DCASE 2017 challenge entries. 16th International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Tokyo, Japan, 17–20 September 2018. Tokyo, 2018, рр. 411–415.

2. Haitsma J., Kalker T. A highly robust audio fingerprinting system. 3rd International Conference on Music Information Retrieval, Paris, France, 13–17 Octоber 2002. Paris, 2002, рр. 107–115.

3. Ilin E. P. Jemocii i chuvstva. Emotions and Feelings. Saint Petersburg, Piter, 2001, 752 p. (In Russ.).

4. Izard K. E. Psihologija jemocij. Psychology of Emotions. Saint Petersburg, Piter, 2012, 464 p. (In Russ.).

5. Karelina I. O. Razvitie ponimanija jemocij v period doshkol'nogo detstva: psihologicheskij rakurs. Developing an Understanding of Emotions during Preschool Childhood: A Psychological Perspective, Prague, Vědecko vydavatelské centrum "Sociosféra-CZ", 2017, 178 p. (In Russ.).

6. Orehova O. A. Cvetovaja diagnostika jemocij. Tipologija razvitija. Monografija. Color Diagnostics of Emotions. Typology of Development. Monograph. Saint Petersburg, Sphere, 2008, 176 p. (In Russ.).

7. Shapoval J. A. Recognition of Human Emotions by image as part of an automated sign language translator. Molodezhnyj nauchno-tekhnicheskij vestnik [Youth Scientific and Technical Bulletin], 2017, no. 7, p. 55 (In Russ.).

8. Golubinskij A. N. Identification of a person's emotional state by a speech signal based on a Wavelet analysis. Vestnik Voronezhskogo instituta Ministerstva vnutrennih del Rossii [Bulletin of the Voronezh Institute of the Ministry of Internal Affairs of Russia], 2011, no. 3, pp. 144–153 (In Russ.).

9. Sidorov K. I., Filatova N. N. Automatic recognition of human emotions based on reconstructions of attractors of speech samples. Programmnye sistemy i vychislitel'nye metody [Software systems and computational methods], 2012, no. 1, pp. 67–79 (In Russ.).

10. Galichij D. A., Afanaciev G. I., Nesterov U. G. Recognition of human emotions using modern methods of deep learning. E-SCIO, 2021, vol. 5, no. 56, pp. 316–329 (In Russ.).

11. Bredihin A. I. The use of wavelets in the task of recognizing a person's emotions by his speech. Sbornik izbrannyh statej nauchnoj sessii Tomskogo gosudarstvennogo universiteta sistem upravlenija i radiojelektroniki [Collection of selected articles of the scientific session of Tomsk State University of Control Systems and Radioelectronics], 2018, no. 1–3, pp. 115–119 (In Russ.).

12. Rumina E. V., Karpov A. A. Analytical review of emotion recognition methods based on human facial expressions. Nauchno-tekhnicheskij vestnik informacionnyh tekhnologij, mekhaniki i optiki [Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics], 2020, vol. 20, no. 2, pp. 163–176 (In Russ.). https://doi.org/10.17586/2226-1494-2020-20-2-163-176

13. Dvoinikova A., Verkholyak O., Karpov A. Emotion recognition and sentiment analysis of extemporaneus speech transcriptions in Russian. Lectures Notes in Computer Science, 2020, vol. 12335, pp. 136–144. https://doi.org/10.1007/978-3-030-60276-5_14

14. Devi J. S., Yarrammelle S., Nandyala S. P. Speaker emotion recognition based on speech feateres and classification techniques. International Journal of Image, Graphics, and Signal Processing, 2014, vol. 6, no. 7, pp. 61–77. https://doi.org/10.5815/ijigsp.2014.07.08

15. Liu Z. I., Xie Q., Wu M., Cao W. H., Mao J. W., Mei Y. Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing, 2018, vol. 309, pp. 145–156. https://doi.org/10.1016/j.neucom.2018.05.005

16. Shirami A., Nilchi A. R. N. Speech emotion recognition based on SVM as both features selector and classifier. International Journal of Image, Graphics, and Signal Processing, 2016, vol. 8, no. 4, pp. 39–45. https://doi.org/10.5815/ijigsp.2016.04.05

17. Assuncao G., Menezes P. Intermediary fuzzyfication in speech emotion recognition. IEEE International Conference on Fuzzy System, Glasgow, United Kingdom, 19–24 July 2020. Glasgow, 2020, p. 9177699. https://doi.org/10.1109/FUZZ48607.2020.9177699

18. Zisad S. N., Hossain M. S., Andersson K. Speech emotion recognition in neurological disorders using convolutional neural network. Lecture Notes in Computer Science, 2020, vol. 12241, pp. 287–296. https://doi.org/10.1007/978-3-030-59277-6_26

19. Werner S., Petrenko G. K. Speech emotion recognition: hemans vs machines. Discourse, 2019, vol. 5, no. 5, pp. 136–152. https://doi.org/10.32603/2412-8562-2019-5-5-136-152

20. Muppidi A., Radfar M. Speech emotion recognition using quaternion convolutional neural networks. IEEE International Conference of Acoustics, Speech and Signal Processing-Proceedings, Toronto, ON, Canada, 6–11 June 2021. Toronto, 2021, pp. 6309–6313. https://doi.org/10.1109/ICASSP39728.2021.9414248

21. Zheng W., Zong Y. Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition. Virtual Reality and Intelligent Hardware, 2021, vol. 3, no. 1, pp. 65–75. https://doi.org/10.1016/j.vrih.2020.11.006

22. Hazjan V., Kacic Z. Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology, 2003, vol. 6, no. 3, pp. 311–320.

23. Zhang C., Xue L. Autoencoder with emotion embedding for speech emotion recognition. IEEE Access, 2021, vol. 9, pp. 51231–51241. https://doi.org/10.1109/ACCESS.2021.3069818

24. Kanwal S., Asghar S. Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access, 2021, vol. 9, pp. 125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659

25. Byoung C. K. A brief review of facial emotion recognition based on visual information. Sensors, 2018, vol. 18, iss. 2, рр. 401. https://doi.org/10.3390/s18020401

26. Ouyang X., Kawaai S., Goh E. G. H., Shen S., Ding W., …, D.-Y. Huang. Audio-visual emotion recognition using deep transfer learning and multiple temporal models. ICMI '17 : Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, United Kingdom, 13–17 November 2017. Glasgow, 2017, pp. 577–582. https://doi.org/10.1145/3136755. 3143012

27. Hassani B., Mahoor M. H. Facial expression recognition using enhanced deep 3D convolutional neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017. Honolulu, 2017, рр. 1955–1962. https://doi.org/10.1109/CVPRW.2017.282


Supplementary files

Review

For citations:


Semenuk V.V., Skladchikov M.V. Algorithm development for recognizing human emotions using a convolutional neural network based on audio data. Informatics. 2022;19(4):53-68. (In Russ.) https://doi.org/10.37661/1816-0301-2022-19-4-53-68

Views: 491


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)