References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

10.37661/1816-0301-2022-19-4-53-68

inform-1211

Research Article

ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ

SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION

Разработка алгоритма распознавания эмоций человека с использованием сверточной нейронной сети на основе аудиоданных

Algorithm development for recognizing human emotions using a convolutional neural network based on audio data

https://orcid.org/0000-0001-6531-1895

Семенюк

В. В.

Semenuk

V. V.

Семенюк Виктория Валерьевна, магистр техниче-ских наук, преподаватель специальных дисциплин

ул. Горького, 163, Донецк, 83000

Viktoriya V. Semenuk, M. Sc. (Eng.), Teacher of Special Disciplines

st. Gorkogo, 163, Donetsk, 83000

semenuk.viktoriya@gmail.com

https://orcid.org/0000-0003-3070-6656

Складчиков

М. В.

Skladchikov

M. V.

Складчиков Максим Владимирович, магистр техни-ческих наук, преподаватель специальных дисциплин

ул. Горького, 163, Донецк, 83000

Maxim V. Skladchikov, M. Sc. (Eng.), Teacher of Special Disciplines

st. Gorkogo, 163, Donetsk, 83000

maxsklad19981@yandex.ru

Донецкий техникум промышленной автоматики имени А. В. ЗахарченкоDonetsk Technical School of Industrial Automation after A. V. Zakharchenko

2022

08092022

1945368

2022

Семенюк В.В., Складчиков М.В.

Semenuk V.V., Skladchikov M.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/1211

Цели. Приведено описание и рассмотрен опыт создания алгоритма распознавания эмоционального состояния субъекта.Методы. Использованы методы обработки изображений.Результаты. Предложенный алгоритм позволяет распознавать эмоциональные состояния субъекта на основании звукового набора данных. Благодаря проведенному исследованию удалось улучшить точность работы алгоритма путем изменения подаваемого на вход нейронной сети набора данных.Описаны этапы обучения сверточной нейронной сети на заранее заготовленном наборе звуковых данных, а также структура алгоритма. Для валидации нейронной сети был отобран иной, не участвующийв тренировке, набор аудиоданных. В результате проведения исследования построены графики, демонстрирующие точность работы предлагаемого метода.После получения первоначальных данных сделан анализ возможностей улучшения алгоритма с точки зрения эргономики и точности его работы. Разработана стратегия, позволяющая добиться лучшего результата и получить более точный алгоритм. На основании заключений, изложенных в статье, приводится обоснование выбора представления набора данных и программного комплекса, необходимого для реализации программной части алгоритма.Заключение. Предложенный алгоритм обладает высокой точностью и не требует больших вычислительных затрат.

Objectives. This article provides a description and experience of creating the algorithm for recognizing the emotional state of the subject.Methods. Image processing methods are used.Results. The proposed algorithm makes it possible to recognize the emotional states of the subject on the basis of an audio data set. It was possible to improve the accuracy of the algorithm by changing the data set supplied to the input of the neural network.The stages of training convolutional neural network on a pre-prepared set of audio data are described, and the structure of the algorithm is described. To validate the neural network different set of audio data, not participating in the training, was selected. As a result of the study, graphs were constructed demonstrating the accuracy of the proposed method.After receiving the initial data of the study, the analysis of the possibilities for improving the algorithm in terms of ergonomics and accuracy of operation was also carried out. The strategy was developed to achieve a better result and obtain a more accurate algorithm. Based on the conclusions presented in the article, the rationale for choosing the representation of the data set and the software package necessary for the implementation of the software part of the algorithm is given.Conclusion. The proposed algorithm has a high accuracy of operation and does not require large computational costs.

нейронная сетьраспознавание эмоций человекасверточная нейронная сетьдактилоскопия звукапрограммная библиотека TensorFlowнейросетевая библиотека Kerasпакет программ Matlab

neural networkhuman emotion recognitionconvolutional neural networksound fingerprintingTensоrFlow software libraryKeras neural network libraryMatlab software package

References1

Mesaros, A. Acoustic scene classification: Overviews of DCASE 2017 challenge entries / A. Mesaros, T. Heittola, T. Virtanen // 16th Intern. Workshop on Acoustic Signal Enhancement (IWAENC 2018), Tokyo, Japan, 17–20 Sept. 2018. – Tokyo, 2018. – Р. 411–415.

Mesaros A., Heittola T., Virtanen T. Acoustic scene classification: Overviews of DCASE 2017 challenge entries. 16th International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Tokyo, Japan, 17–20 September 2018. Tokyo, 2018, рр. 411–415.

Haitsma, J. A highly robust audio fingerprinting system / J. Haitsma, T. Kalker // 3rd Intern. Conf. on Music Information Retrieval, Paris, France, 13–17 Oct. 2002. – Paris, 2002. – Р. 107–115.

Haitsma J., Kalker T. A highly robust audio fingerprinting system. 3rd International Conference on Music Information Retrieval, Paris, France, 13–17 Octоber 2002. Paris, 2002, рр. 107–115.

Ильин, Е. П. Эмоции и чувства / Е. П. Ильин. – СПб. : Питер, 2001. – 752 с.

Ilin E. P. Jemocii i chuvstva. Emotions and Feelings. Saint Petersburg, Piter, 2001, 752 p. (In Russ.).

Изард, К. Э. Психология эмоций / К. Э. Изард. – СПб. : Питер, 2012. – 464 с.

Izard K. E. Psihologija jemocij. Psychology of Emotions. Saint Petersburg, Piter, 2012, 464 p. (In Russ.).

Карелина, И. О. Развитие понимания эмоций в период дошкольного детства: психологический ракурс : монография / И. О. Карелина. – Прага : Vědecko vydavatelské centrum «Sociosféra-CZ», 2017. – 178 с.

Karelina I. O. Razvitie ponimanija jemocij v period doshkol'nogo detstva: psihologicheskij rakurs. Developing an Understanding of Emotions during Preschool Childhood: A Psychological Perspective, Prague, Vědecko vydavatelské centrum "Sociosféra-CZ", 2017, 178 p. (In Russ.).

Орехова, О. А. Цветовая диагностика эмоций. Типология развития : монография / О. А. Орехова. – СПб. : Речь; М. : Сфера, 2008. – 176 с.

Orehova O. A. Cvetovaja diagnostika jemocij. Tipologija razvitija. Monografija. Color Diagnostics of Emotions. Typology of Development. Monograph. Saint Petersburg, Sphere, 2008, 176 p. (In Russ.).

Шаповал, Ж. Я. Распознавание эмоций человека по изображению как часть автоматизированного переводчика языка жестов / Ж. Я. Шаповал // Молодежный научно-технический вестник. – 2017. – № 7. – С. 55.

Shapoval J. A. Recognition of Human Emotions by image as part of an automated sign language translator. Molodezhnyj nauchno-tekhnicheskij vestnik [Youth Scientific and Technical Bulletin], 2017, no. 7, p. 55 (In Russ.).

Голубинский, А. Н. Выявление эмоционального состояния человека по речевому сигналу на основе вейвлет-анализа / А. Н. Голубинский // Вестник Воронежского института МВД России. – 2011. – № 3. – С. 144–153.

Golubinskij A. N. Identification of a person's emotional state by a speech signal based on a Wavelet analysis. Vestnik Voronezhskogo instituta Ministerstva vnutrennih del Rossii [Bulletin of the Voronezh Institute of the Ministry of Internal Affairs of Russia], 2011, no. 3, pp. 144–153 (In Russ.).

Сидоров, К. И. Автоматическое распознавание эмоций человека на основе реконструкций аттракторов образцов речи / К. И. Сидоров, Н. Н. Филатова // Программные системы и вычислительные методы. – 2012. – № 1. – С. 67–79.

Sidorov K. I., Filatova N. N. Automatic recognition of human emotions based on reconstructions of attractors of speech samples. Programmnye sistemy i vychislitel'nye metody [Software systems and computational methods], 2012, no. 1, pp. 67–79 (In Russ.).

Галичий, Д. А. Распознавание эмоций человека при помощи современных методов глубокого обучения / Д. А. Галичий, Г. И. Афанасьев, Ю. Г. Нестеров // E-SCIO. – 2021. – Т. 5, № 56. – С. 316–329.

Galichij D. A., Afanaciev G. I., Nesterov U. G. Recognition of human emotions using modern methods of deep learning. E-SCIO, 2021, vol. 5, no. 56, pp. 316–329 (In Russ.).

Бредихин, А. И. Применение вейвлетов в задаче распознавания эмоций человека по его речи / А. И. Бредихин // Сборник избранных статей научной сессии ТУСУР. – 2018. – № 1–3. – С. 115–119.

Bredihin A. I. The use of wavelets in the task of recognizing a person's emotions by his speech. Sbornik izbrannyh statej nauchnoj sessii Tomskogo gosudarstvennogo universiteta sistem upravlenija i radiojelektroniki [Collection of selected articles of the scientific session of Tomsk State University of Control Systems and Radioelectronics], 2018, no. 1–3, pp. 115–119 (In Russ.).

Рюмина, Е. В. Аналитический обзор методов распознавания эмоций по выражениям лица человека / Е. В. Рюмина, А. А. Карпов // Научно-технический вестник информационных технологий, механики и оптики. – 2020. – Т. 20, № 2. – С. 163–176.

Rumina E. V., Karpov A. A. Analytical review of emotion recognition methods based on human facial expressions. Nauchno-tekhnicheskij vestnik informacionnyh tekhnologij, mekhaniki i optiki [Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics], 2020, vol. 20, no. 2, pp. 163–176 (In Russ.). https://doi.org/10.17586/2226-1494-2020-20-2-163-176

Dvoinikova, A. Emotion recognition and sentiment analysis of extemporaneus speech transcriptions in Russian / A. Dvoinikova, O. Verkholyak, A. Karpov // Lectures notes in computer science. – 2020. – Vol. 12335. – P. 136–144. https://doi.org/10.1007/978-3-030-60276-5_14

Dvoinikova A., Verkholyak O., Karpov A. Emotion recognition and sentiment analysis of extemporaneus speech transcriptions in Russian. Lectures Notes in Computer Science, 2020, vol. 12335, pp. 136–144. https://doi.org/10.1007/978-3-030-60276-5_14

Devi, J. S. Speaker emotion recognition based on speech feateres and classification techniques / J. S. Devi, S. Yarrammelle, S. P. Nandyala // Intern. J. of Image, Graphics, and Signal Processing. – 2014. – Vol. 6, no. 7. – P. 61–77. https://doi.org/10.5815/ijigsp.2014.07.08

Devi J. S., Yarrammelle S., Nandyala S. P. Speaker emotion recognition based on speech feateres and classification techniques. International Journal of Image, Graphics, and Signal Processing, 2014, vol. 6, no. 7, pp. 61–77. https://doi.org/10.5815/ijigsp.2014.07.08

Speech emotion recognition based on an improved brain emotion learning model / Z. I. Liu [et al.] // Neurocomputing. – 2018. – Vol. 309. – P. 145–156. https://doi.org/10.1016/j.neucom.2018.05.005

Liu Z. I., Xie Q., Wu M., Cao W. H., Mao J. W., Mei Y. Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing, 2018, vol. 309, pp. 145–156. https://doi.org/10.1016/j.neucom.2018.05.005

Shirami, A. Speech emotion recognition based on SVM as both features selector and classifier / A. Shirami, A. R. N. Nilchi // Intern. J. of Image, Graphics, and Signal Processing. – 2016. – Vol. 8, no. 4. – P. 39–45. https://doi.org/10.5815/ijigsp.2016.04.05

Shirami A., Nilchi A. R. N. Speech emotion recognition based on SVM as both features selector and classifier. International Journal of Image, Graphics, and Signal Processing, 2016, vol. 8, no. 4, pp. 39–45. https://doi.org/10.5815/ijigsp.2016.04.05

Assuncao, G. Intermediary fuzzyfication in speech emotion recognition / G. Assuncao, P. Menezes // IEEE Intern. Conf. on Fuzzy System, Glasgow, United Kingdom, 19–24 July 2020. – Glasgow, 2020. – P. 9177699. https://doi.org/10.1109/FUZZ48607.2020.9177699

Assuncao G., Menezes P. Intermediary fuzzyfication in speech emotion recognition. IEEE International Conference on Fuzzy System, Glasgow, United Kingdom, 19–24 July 2020. Glasgow, 2020, p. 9177699. https://doi.org/10.1109/FUZZ48607.2020.9177699

Zisad, S. N. Speech emotion recognition in neurological disorders using convolutional neural network / S. N. Zisad, M. S. Hossain, K. Andersson // Lecture Notes in Computer Science. – 2020. – Vol. 12241. – P. 287–296. https://doi.org/10.1007/978-3-030-59277-6_26

Zisad S. N., Hossain M. S., Andersson K. Speech emotion recognition in neurological disorders using convolutional neural network. Lecture Notes in Computer Science, 2020, vol. 12241, pp. 287–296. https://doi.org/10.1007/978-3-030-59277-6_26

Werner, S. Speech emotion recognition: hemans vs machines / S. Werner, G. K. Petrenko // Discourse. – 2019. – Vol. 5, no. 5. – P. 136–152. https://doi.org/10.32603/2412-8562-2019-5-5-136-152

Werner S., Petrenko G. K. Speech emotion recognition: hemans vs machines. Discourse, 2019, vol. 5, no. 5, pp. 136–152. https://doi.org/10.32603/2412-8562-2019-5-5-136-152

Muppidi, A. Speech emotion recognition using quaternion convolutional neural networks / A. Muppidi, M. Radfar // IEEE Intern. Conf. of Acoustics, Speech and Signal Processing-Proceedings, Toronto, ON, Canada, 6–11 June 2021. – Toronto, 2021. – P. 6309–6313. https://doi.org/10.1109/ICASSP39728.2021.9414248

Muppidi A., Radfar M. Speech emotion recognition using quaternion convolutional neural networks. IEEE International Conference of Acoustics, Speech and Signal Processing-Proceedings, Toronto, ON, Canada, 6–11 June 2021. Toronto, 2021, pp. 6309–6313. https://doi.org/10.1109/ICASSP39728.2021.9414248

Zheng, W. Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition / W. Zheng, Y. Zong // Virtual Reality and Intelligent Hardware. – 2021. – Vol. 3, no. 1. – P. 65–75. https://doi.org/10.1016/j.vrih.2020.11.006

Zheng W., Zong Y. Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition. Virtual Reality and Intelligent Hardware, 2021, vol. 3, no. 1, pp. 65–75. https://doi.org/10.1016/j.vrih.2020.11.006

Hazjan, V. Context-independent multilingual emotion recognition from speech signals / V. Hazjan, Z. Kacic // Intern. J. of Speech Technology. – 2003. – Vol. 6, no. 3. – P. 311–320.

Hazjan V., Kacic Z. Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology, 2003, vol. 6, no. 3, pp. 311–320.

Zhang, C. Autoencoder with emotion embedding for speech emotion recognition / C. Zhang, L. Xue // IEEE Access. – 2021. – Vol. 9. – P. 51231–51241. https://doi.org/10.1109/ACCESS.2021.3069818

Zhang C., Xue L. Autoencoder with emotion embedding for speech emotion recognition. IEEE Access, 2021, vol. 9, pp. 51231–51241. https://doi.org/10.1109/ACCESS.2021.3069818

Kanwal, S. Speech emotion recognition using clustering based GA-optimized feature set / S. Kanwal, S. Asghar // IEEE Access. – 2021. – Vol. 9. – P. 125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659

Kanwal S., Asghar S. Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access, 2021, vol. 9, pp. 125830–125842. https://doi.org/10.1109/ACCESS.2021.3111659

Byoung, C. K. A brief review of facial emotion recognition based on visual information / C. K. Byoung // Sensors. – 2018. – Vol. 18, iss. 2. – Р. 401. https://doi.org/10.3390/s18020401

Byoung C. K. A brief review of facial emotion recognition based on visual information. Sensors, 2018, vol. 18, iss. 2, рр. 401. https://doi.org/10.3390/s18020401

Audio-visual emotion recognition using deep transfer learning and multiple temporal models / X. Ouyang [et al.] // ICMI '17 : Proc. of the 19th ACM Intern. Conf. on Multimodal Interaction, Glasgow, United Kingdom, 13–17 November 2017. – Glasgow, 2017. – P. 577–582. https://doi.org/10.1145/3136755.3143012

Ouyang X., Kawaai S., Goh E. G. H., Shen S., Ding W., …, D.-Y. Huang. Audio-visual emotion recognition using deep transfer learning and multiple temporal models. ICMI '17 : Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, United Kingdom, 13–17 November 2017. Glasgow, 2017, pp. 577–582. https://doi.org/10.1145/3136755. 3143012

Hassani, B. Facial expression recognition using enhanced deep 3D convolutional neural networks / B. Hassani, M. H. Mahoor // 2017 IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017. – Honolulu, 2017. – Р. 1955–1962. https://doi.org/10.1109/CVPRW.2017.282

Hassani B., Mahoor M. H. Facial expression recognition using enhanced deep 3D convolutional neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017. Honolulu, 2017, рр. 1955–1962. https://doi.org/10.1109/CVPRW.2017.282

The authors declare that there are no conflicts of interest present.