Цели

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

10.37661/1816-0301-2024-21-3-48-62

inform-1306

Research Article

ИНТЕЛЛЕКТУАЛЬНЫЕ СИСТЕМЫ

INTELLIGENT SYSTEMS

Разработка метода подражательного обучения для нейросетевой системы управления движением мобильного робота на примере задачи поиска выхода из лабиринта

Development of an imitation learning method for a neural network system of mobile robot’s movement on example of the maze solving

https://orcid.org/0000-0002-4126-6572

Ким

Т. Ю.

Kim

T. Yu.

Ким Татьяна Юрьевна, младший научный сотрудник, лаборатория робототехнических систем № 116

ул. Сурганова, 6, Минск, 220012

Tatyana Yu. Kim, Junior Researcher, Laboratory of Robotic Systems No. 116

st. Surganova, 6, Minsk, 220012

tatyana_kim92@mail.ru

https://orcid.org/0000-0002-3412-9174

Прокопович

Г. А.

Prakapovich

R. A.

Прокопович Григорий Александрович, кандидат технических наук, доцент

ул. Сурганова, 6, Минск, 220012

Ryhor A. Prakapovich, Ph. D. (Eng.), Assoc. Prof.

st. Surganova, 6, Minsk, 220012

rprakapovich@robotics.by

Объединенный институт проблем информатики Национальной академии наук БеларусиThe United Institute of Informatics Problems of the National Academy of Sciences of Belarus

2024

30092024

2134862

2024

Ким Т.Ю., Прокопович Г.А.

Kim T.Y., Prakapovich R.A.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/1306

Цели

Цели. Поставлена цель разработать новый метод обучения системы управления мобильным роботом поиску выхода из лабиринта на основе обучения с подкреплением и алгоритма правой руки.

Методы

Методы. В работе применен метод компьютерного моделирования в среде MATLAB/Simulink.

Результаты

Результаты. Предложен новый метод обучения системы управления мобильным роботом, способный реализовывать алгоритм правой руки для поиска выхода из лабиринта. Данный метод основан на работе двух агентов, взаимодействующих между собой: первый непосредственно реализует поисковый алгоритм и ищет выход из лабиринта, а второй, следуя за ним, с помощью метода подражательного обучения пытается научиться находить выход из лабиринта. Агент-эксперт, реализуя дискретный алгоритм движения по лабиринту, совершает точные дискретные шаги и движется почти независимо от второго агента. Единственным ограничением является скорость его движения, которая прямо пропорционально зависит от расстояния между агентами. Второй агент, агент-ученик, методом проб и ошибок старается сократить расстояние до первого. Для реализации процесса обучения использовался метод обучения с подкреплением в режиме подражания, для которого была разработана соответствующая функция вознаграждения, позволяющая удерживать центр масс робота в центре коридора и при необходимости поворачивать, следуя за агентом-экспертом. Агенты передвигаются по виртуальному полигону, состоящему из разветвленных коридоров, достаточно широких для реализации различных маневров движений.

Заключение

Заключение. Было доказано, что благодаря предложенному методу подражательного обучения агентученик способен не только перенимать от агента-эксперта требуемые паттерны поведения (искать в ранее неизвестном лабиринте выход по алгоритму правой руки), но и самостоятельно приобретать новые (изменять скорость на повороте, обходить небольшие коридоры-тупики), которые положительным образом влияют на выполнение поставленной задачи.

Objectives

Objectives. To develop a new method for training a mobile robot control system to use a maze solver algorithm based on reinforcement learning and the right-hand algorithm.

Methods

Methods. The work uses the method of computer modeling in the MATLAB/Simulink environment.

Results

Results. A new method for training a mobile robot control system capable of implementing the right-hand algorithm for finding an exit from a maze is proposed. The proposed method is based on the work of two agents interacting with each other: the first directly implements the search algorithm and searches for an exit from the maze, and the second, following it, tries to learn using the imitation learning method. The expert agent, implementing a discrete algorithm for moving through the maze, makes precise discrete steps and moves almost independently of the second agent. The only limitation is its speed, which is directly proportional to the distance between the agents. The second agent, the student agent, tries to reduce the distance to the first agent by trial and error. The learning process was implemented using the reinforcement learning method, which was used in the imitation mode and for which a corresponding reward function was developed, allowing the robot's center of mass to be kept in the center of the corridor and, if necessary, to turn, following the expert agent. The agents move along a virtual polygon consisting of branched corridors wide enough to implement various movement maneuvers.

Conclusion

Conclusion. It was proven that, thanks to the proposed method of imitative learning, the student agent is able not only to adopt the required behavior patterns from the expert agent – to search for an exit in a previously unknown labyrinth using the right-hand algorithm, but also to independently acquire new ones (changing speed on a turn, bypassing small dead-end corridors), which positively influence the performance of the assigned task.

мобильный роботагентобучение с подкреплениемалгоритм правой рукилабиринтподражательное обучение

mobile robotagentreinforcement learningright-hand algorithmmazeimitative learning

Работа была выполнена при поддержке гранта БРФФИ Ф22КИТГ-002 и задания Т31 ГПНИ «Цифровые и космические технологии, безопасность человека, общества и государства» (2021–2025).

The work was supported by the BRFFR grant F22KITG-002 and the task T31 of the State Program for Scientific Research "Digital and Space Technologies, Security of Man, Society and the State" (2021–2025).

References1

Towards continuous control for mobile robot navigation: A reinforcement learning and slam based approach / K. A. A. Mustafa [et al.] // Intern. Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. – 2019. – Vol. 42. – Р. 857–863. https://doi.org/10.5194/isprs-archives-XLII-2-W13-857-2019

Mustafa K. A. A., Botteghi N., Sirmacek B., Poel M., Stramigioli S. Towards continuous control for mobile robot navigation: A reinforcement learning and slam based approach. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2019, vol. 42, рр. 857–863. https://doi.org/10.5194/isprs-archives-XLII-2-W13-857-2019

Truong, X. T. Toward socially aware robot navigation in dynamic and crowded environments: A proactive social motion model / X. T. Truong, T. D. Ngo // IEEE Transactions on Automation Science and Engineering. – 2017. – Vol. 14, no. 4. – P. 1743–1760. https://doi.org/10.1109/TASE.2017.2731371

Truong, X. T., Ngo T. D. Toward socially aware robot navigation in dynamic and crowded environments: A proactive social motion model. IEEE Transactions on Automation Science and Engineering, 2017, vol. 14, no. 4, рр. 1743–1760. https://doi.org/10.1109/TASE.2017.2731371

Playing Atari with Deep Reinforcement Learning [Electronic resource] / V. Mhin [et al.]. – 2013. – Mode of access: https://doi.org/10.48550/arXiv.1312.5602. – Date of access: 20.06.2024.

Mhin V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., …, Riedmiller M. Playing Atari with Deep Reinforcement Learning, 2013. Available at: https://doi.org/10.48550/arXiv.1312.5602 (accessed 20.06.2024).

Mastering the game of Go with deep neural networks and tree search / D. Silver [et al.] // Nature. – 2016. – Vol. 529, no. 7587. – Р. 484–489.

Silver D., Huang A., Maddison C. J., Guez A., Sifre L., …, Ha abi D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, vol. 529, no. 7587, рр. 484–489.

Learning dexterous in-hand manipulation / M. Andrychowicz [et al.] // The Intern. J. of Robotics Research. – 2020. – Vol. 39, no. 1. – Р. 3–20. https://doi.org/10.1177/0278364919887447

Andrychowicz M., Baker B., Chociej M., Józefowicz R., McGrew B., …, Zaremba W. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 2020, vol. 39, no. 1, рр. 3–20. https://doi.org/10.1177/0278364919887447

Emergence of Locomotion Behaviours in Rich Environments [Electronic resource] / N. Heess [et al.]. – 2017. – Mode of access: https://doi.org/10.48550/arXiv.1707.02286. – Date of access: 20.06.2024.

Heess N., Dhruva T. B., Sriram S., Lemmon J., Merel J., …, Silver D. Emergence of Locomotion Behaviours in Rich Environments, 2017. Available at: https://doi.org/10.48550/arXiv.1707.02286 (accessed 20.06.2024).

Autonomous vehicle perception: The technology of today and tomorrow / J. V. Brummelen [et al.] // Transportation Research Part C: Emerging Technologies. – 2018. – No. 86. – P. 384–406. https://doi.org/10.1016/j.trc.2018.02.012

Brummelen J. V., O'Brien M., Gruyer D., Najjaran H. Autonomous vehicle perception: The technology of today and tomorrow. Transportation Research Part C: Emerging Technologies, 2018, no. 86, рр. 384–406. https://doi.org/10.1016/j.trc.2018.02.012

Huang, W. Learning to drive via Apprenticeship Learning and Deep Reinforcement Learning [Electronic resource] / W. Huang, F. Braghin, Z. Wang. – 2020. – P. 1–7. – Mode of access: https://doi.org/10.48550/arXiv.2001.03864. – Date of access: 20.06.2024.

Huang W., Braghin F., Wang Z. Learning to drive via Apprenticeship Learning and Deep Reinforcement Learning, 2020, рр. 1–7. Available at: https://doi.org/10.48550/arXiv.2001.03864 (accessed 20.06.2024).

Robust AI driving strategy for autonomous vehicles / S. Nageshrao [et al.] // AI-enabled Technologies for Autonomous and Connected Vehicles. – Springer, 2022. – Р. 161–212.

Nageshrao S., Rahman Y., Ivanovic V., Jankovic M., T eng E., …, Filev D. Robu t AI driving trategy for autonomous vehicles. AI-enabled Technologies for Autonomous and Connected Vehicles. Springer, 2022, рр. 161–212.

Sensor and sensor fusion technology in autonomous vehicles: A review / D. J. Yeong [et al.] // Sensors. – 2021. – Vol. 21, iss. 6. – Р. 2140. https://doi.org/10.3390/s21062140

Yeong D. J., Velasco-Hernandez G., Barry J., Walsh J. Sensor and sensor fusion technology in autonomous vehicles: A review. Sensors, 2021, vol. 21, iss. 6, р. 2140. https://doi.org/10.3390/s21062140

Kweon, J. Deep reinforcement learning for guidewire navigation in coronary artery phantom / J. Kweon, K. Kim, Ch. Lee // IEEE Access. – 2021. – Vol. 9. – P. 166409–166422. https://doi.org/10.1109/ACCESS.2021.3135277

Kweon J., Kim K., Lee Ch. Deep reinforcement learning for guidewire navigation in coronary artery phantom. IEEE Access, 2021, vol. 9, рр. 166409–166422. https://doi.org/10.1109/ACCESS.2021.3135277

An Algorithmic Perspective on Imitation Learning / T. Osa [et al.]. – Bo ton : Now publishers Inc., 2018. – 188 p.

Osa T., Pajarinen J., Neumann G., Bagnell J. A., Abbeel P., Peters J. An Algorithmic Perspective on Imitation Learning. Bo ton, Now publishers Inc., 2018, 188 p.

Лонца, A. Алгоритмы обучения с подкреплением на Python / A. Лонца ; пер. с англ. А. А. Слинкина. – М. : ДМК Пресс, 2020. – 285 с.

Lonza, A. Reinforcement Learning Algorithms with Python. Packt Publishing, 2019, 366 р.

Chella, А. Imitation learning and anchoring through conceptual spaces / А. Chella // Applied Artificial Intelligence. – 2007. – No. 21. – P. 343–359.

Chella, А imitation learning and anchoring through conceptual spaces. Applied Artificial Intelligence, 2007, no. 21, рр. 343–359.

Kim, T. Automatic tuning of the motion control system of a mobile robot along a trajectory based on the reinforcement learning method / T. Kim, R. Prakapovich // Communications in Computer and Information Science. – Springer, Cham, 2022. – Vol. 1562. – P. 234–244. https://doi.org/10.1007/978-3-030-98883-8_17

Kim T., Prakapovich R. Automatic tuning of the motion control system of a mobile robot along a trajectory based on the reinforcement learning method. Communications in Computer and Information Science. Springer, Cham, 2022, vol. 1562, рр. 234–244. https://doi.org/10.1007/978-3-030-98883-8_17

Sutton, R. S. Reinforcement Learning: An Introduction / R. S. Sutton, A. G. Barto. – 2nd ed. – London, England : The MIT Press, 2014. – 352 р.

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction, 2nd edition. London, England, The MIT Press, 2014, 352 р.

Watkin , C. Q-learning / C. Watkin , P. Dayan // Machine Learning. – 1992. – Vol. 8, i . 3–4. – Р. 279–292.

Watkin C., Dayan P. Q-learning. Machine Learning, 1992, vol. 8, i . 3–4, рр. 279–292.

Duan, J. M. Prior knowledge ba ed Q-learning path planning algorithm / J. M. Duan, Q. L. Chen // Electronic Optic & Control. – 2019. – Vol. 26, i . 9. – Р. 29–33.

Duan J. M., Chen Q. L. Prior knowledge ba ed Q-learning path planning algorithm. Electronics Optics & Control, 2019, vol. 26, i . 9, рр. 29–33.

Sutton, R. S. Reinforcement Learning: An Introduction / R. S. Sutton, A. G. Barto. – 2nd ed. – London, England : The MIT Pre , 2014. – 338 р.

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction, 2nd edition. London, England, The MIT Pre , 2014, 338 р.

Rossi, F. Horizontal and vertical scaling of container-based applications using reinforcement learning / F. Rossi, M. Nardelli, V. Cardellini // 2019 IEEE 12th Intern. Conf. on Cloud Computing (CLOUD), Milan, Italy, 8–13 July 2019. – Milan, 2019. – P. 329–338. https://doi.org/10.1109/CLOUD.2019.00061

Rossi F., Nardelli M., Cardellini V. Horizontal and vertical scaling of container-based applications using reinforcement learning. 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, Italy, 8–13 July 2019. Milan, 2019, рр. 329–338. https://doi.org/10.1109/CLOUD.2019.00061

PAC model-free reinforcement learning / A. L. Strehl [et al.] // ICML’06: Proc. of the 23th Intern. Conf. on Machine Learning, Pittsburgh, Pennsylvania, USA, 25–29 June 2006. – Pittsburgh, 2006. – P. 881–888. https://doi.org/10.1145/1143844.114395

Strehl A. L., Li L., Wiewiora E., Langford J., Littman M. L. PAC model-free reinforcement learning. ICML’06: Proceeding of the 23th International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA, 25–29 June 2006. Pittsburgh, 2006, рр. 881–888. https://doi.org/10.1145/1143844.114395

Ravichandiran, S. Deep Reinforcement Learning with Python / S. Ravichandiran. – 2nd ed. – Packt Publishing, 2020. – 760 p.

Ravichandiran S. Deep Reinforcement Learning with Python, 2nd edition. Packt Publishing, 2020, 760 p.

Yu, Ch. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units / Ch. Yu, G. Ren // BMC Medical Informatics and Decision Making. – 2020. – No. 20 (S3). – P. 1–8. https://doi.org/10.1186/s12911-020-1120-5

Yu Ch., Ren G. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC Medical Informatics and Decision Making, 2020, no. 20 (S3), рр. 1–8. https://doi.org/10.1186/s12911-020-1120-5

Imitation learning: progress, taxonomies and challenges [Electronic resource] / B. Zheng [et al.] // IEEE Transactions on Neural Networks and Learning Systems. – 2022. – P. 1–22. – Mode of access: https://arxiv.org/abs/2106.12177. – Date of access: 20.06.2024.

Zheng B., Verma S., Zhou J., Tsang I., Chen F. Imitation learning: progress, taxonomies and challenges. IEEE Transactions on Neural Networks and Learning Systems, 2022, рр. 1–22. Available at: https://arxiv.org/abs/2106.12177 (accessed 20.06.2024).

Ким, Т. Ю. Форсированное управление движением мобильного робота / Т. Ю. Ким, Г. А. Прокопович, А. А. Лобатый // Информатика. – 2022. − Т. 19, № 3. – С. 86–100. https://doi.org/10.37661/1816-0301-2022-19-3-86-100

Kim T. Yu., Prakapovich R. A. Lobatiy A. A. Forced motion control of a mobile robot. Informatika [Informatics], 2022, vol. 19, no. 3, pp. 86−100 (In Russ.). https://doi.org/10.37661/1816-0301-2022-19-3-86-100

The authors declare that there are no conflicts of interest present.