References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

10.37661/1816-0301-2020-17-2-7-16

inform-1056

Research Article

ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ

SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION

Обнаружение объектов на изображениях с большим разрешением на основе их пирамидально-блочной обработки

Object detection in high resolution images based on multiscale and block processing

https://orcid.org/0000-0002-6609-5810

Богуш

Р. П.

Bohush

R. P.

Богуш Рихард Петрович, кандидат технических наук, доцент, заведующий кафедрой вычислительных систем и сетей, факультет информационных технологий

Новополоцк

Rykhard P. Bohush, Cand. Sci. (Eng.), Associate Professor, Head of the Department of Computer Systems and Networks

Novopolotsk

bogushr@mail.ru

https://orcid.org/0000-0001-7580-7878

Захарова

И. Ю.

Zakharava

I. Yu.

Захарова Ирина Юрьевна, магистр технических наук, аспирант кафедры вычислительных систем и сетей, факультет информационных технологий

Новополоцк

Iryna Yu. Zakharava, M. Sci. (Eng.), Postgraduate Student at the Department of Computer Systems and Networks

Novopolotsk

ira9992011@yandex.ru

https://orcid.org/0000-0001-9404-1206

Абламейко

С. В.

Ablameyko

S. V.

Абламейко Сергей Владимирович, академик Национальной академии наук Беларуси, доктор технических наук, профессор, профессор механико-математического факультета

Минск

Sergey V. Ablameyko, Academician of the National Academy of Sciences of Belarus, Dr. Sci. (Eng.), Professor, Professor of the Faculty of Mechanics and Mathematics

Minsk

ablameyko@bsu.by

Полоцкий государственный университетPolotsk State University

Белорусский государственный университет; Объединенный институт проблем информатики Национальной академии наук БеларусиBelarusian State University; The United Institute of Informatics Problems of the National Academy of Sciences of Belarus

2020

22042020

172716

2020

Богуш Р.П., Захарова И.Ю., Абламейко С.В.

Bohush R.P., Zakharava I.Y., Ablameyko S.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/1056

Предлагается алгоритм для обнаружения объектов на изображениях с большим разрешением, основанный на многомасштабном представлении изображения, пирамидально-блочной обработке с перекрытием, применении сверточной нейронной сети для каждого блока и объединении обнаруженных областей. Количество слоев пирамиды определяется размерами изображения и входного слоя используемой сверточной нейронной сети. На всех уровнях, кроме самого верхнего, выполняется блочное разбиение, а применение при этом перекрытия позволяет улучшить правильную классификацию объектов, которые разделяются на фрагменты и расположены в соседних блоках. Решение об объединении таких областей принимается на основе анализа метрики пересечения над объединением для них и принадлежности к одному классу. Представленные результаты тестирования алгоритма подтверждают, что рассмотренный подход позволяет повысить точность обнаружения объектов небольших размеров на изображениях с большим разрешением.

In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.

сверточная нейронная сетьблочная обработкаразрешение 4Кобнаружение объектовмногомасштабное представление изображения

convolutional neural networkblock processing4К resolutionobject detectionmultiscale image representation

References1

Дворкович, А. В. Метрологическое обеспечение видеоинформационных систем / А. В. Дворкович, В. П. Дворкович. – М. : Техносфера, 2015. – 784 с.

Dvorkovich A. V., Dvorkovich V. P. Metrologicheskoe obespechenie videoinformatsionnykh system. Metrological Support of Video Information Systems, Moscow, Technosphera, 2015, 784 p. (in Russian).

Goulekas, K. Visual Effects in a Digital World: a Comprehensive Glossary of over 7,000 Visual Effects Terms / K. Goulekas. – San Francisco : Morgan Kaufmann, 2001. – 600 p.

Goulekas K. Visual Effects in a Digital World: a Comprehensive Glossary of over 7,000 Visual Effects Terms. San Francisco, Morgan Kaufmann, 2001, 600 p.

An effective object detection algorithm for high resolution video by using convolutional neural network / D. Vorobjov [et al.] // Advances in Neural Networks-ISNN2018. Lecture Notes in Computer Science. – 2018. – Vol. 10878. – P. 503–510. https://doi.org/10.1007/978-3-319-92537-0_58

Vorobjov D., Zakharova I., Bohush R., Ablameyko S. An effective object detection algorithm for high resolution video by using convolutional neural network. Advances in Neural Networks-ISNN2018. Lecture Notes in Computer Science, 2018, vol. 10878, pp. 503–510. https://doi.org/10.1007/978-3-319-92537-0_58

Yongxi, L. Efficient object detection for high resolution images / L. Yongxi, T. Javidi // Proc. of 53 rd Annual Allerton Conf. on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 Sept. – 2 Oct. 2015. – Monticello, 2015. – P. 1091–1098. https://doi.org/10.1109/ALLERTON.2015.7447130

Yongxi L., Javidi T. Efficient object detection for high resolution images. Proceedings of 53 rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September – 2 October 2015. Monticello, 2015, pp. 1091–1098. https://doi.org/10.1109/ALLERTON.2015.7447130

Sun database: large-scale scene recognition from abbey to zoo / J. Xiao [et al.] // Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010. – San Francisco, 2010. – P. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970

Xiao J., Hays J., Ehinger K., Oliva A., Torralba A. Sun database: large-scale scene recognition from abbey to zoo. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010. San Francisco, 2010, pp. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970

Ruzicka, V. Fast and accurate object detection in high resolution 4K and 8K video using GPUs / V. Ruzicka, F. Franchetti // Proc. of 2018 IEEE High Performance Extreme Computing Conf. (HPEC), Waltham, MA, USA, 25–27 Sept. 2018. – Waltham, 2018. – P. 1–7. https://doi.org/10.1109/HPEC.2018.8547574

Ruzicka V., Franchetti F. Fast and accurate object detection in high resolution 4K and 8K video using GPUs. Proceedings of 2018 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 25–27 September 2018. Waltham, 2018, pp. 1–7. https://doi.org/10.1109/HPEC.2018.8547574

Korshunov, P. UHD video dataset for evaluation of privacy / P. Korshunov, T. Ebrahimi // Proc. of Sixth Intern. Workshop on Quality of Multimedia Experience (QoMEX), Singapore, 18–20 Sept. 2014. – Singapore, 2014. – P. 232–237. https://doi.org/10.1109/QoMEX.2014.6982324

Korshunov P., Ebrahimi T. UHD video dataset for evaluation of privacy. Proceedings of Sixth International Workshop on Quality of Multimedia Experience (QoMEX), Singapore, 18–20 September 2014. Singapore, 2014, pp. 232–237. https://doi.org/10.1109/QoMEX.2014.6982324

Unel, F. O. The power of tiling for small object detection / F. O. Unel, B. Ozkalayci, C. Çigla // CVPR Workshops [Electronic resource]. – 2019. – Mode of access: http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf. – Date of access: 18.01.2020.

Unel F. O., Ozkalayci B., Çigla C. The power of tiling for small object detection. CVPR Workshops, 2019. Available at: http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf. (accessed 18.01.2020).

Region-based convolutional networks for accurate object detection and segmentation / R. Girshick [et al.] // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2016. – Vol. 38. – P. 142–158. https://doi.org/10.1109/TPAMI.2015.2437384

Girshick R., Donahue J., Darrell T., Malik J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, vol. 38, pp. 142–158. https://doi.org/10.1109/TPAMI.2015.2437384

Deep residual learning for image recognition / K. He [et al.] // Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. – Las Vegas, 2016. – P. 770–778. https://doi.org/10.1109/CVPR.2016.90

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. Las Vegas, 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

You only look once: unified, real-time object detection / J. Redmon [et al.] // Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. – Las Vegas, 2016. – P. 779–788. https://doi.org/10.1109/CVPR.2016.91

Redmon J., Divvala S. K., Girshick R. B., Farhadi A. You only look once: unified, real-time object detection. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016. Las Vegas, 2016, pp.779–788. https://doi.org/10.1109/CVPR.2016.91

Girshick, R. Fast R-CNN / R. Girshick // Proc. of IEEE Intern. Conf. on Computer Vision (ICCV), Santiago, Chile, 11–18 Dec. 2015. – Santiago, 2015. – P. 1440–1448. https://doi.org/10.1109/ICCV.2015.169

Girshick R. Fast R-CNN. Proceedings of IEEE Intern. Conf. on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015. Santiago, 2015, pp. 1440–1448. https://doi.org/10.1109/ICCV.2015.169

Faster R-CNN: towards real-time object detection with region proposal networks / S. Ren [et al.] // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2015. – Vol. 39, no. 6. – P. 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

Ren S., He K., Girshick R., Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, vol. 39, no. 6, pp. 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

Глубокое обучение для детектирования объектов на изображениях документов / А. А. Крощенко и др. // Вестник БрГТУ. Физика, математика, информатика. – 2017. – № 5(107). – С. 2–9.

Kroshchenko A., Golovko V., Bezobrazov S., Mikhno E., Khatskevich M, …, Brich A. Glubokoe obuchenie dlia detektirovaniia obieektov na izobrazheniiakh dokumentov [Deep training for detecting of objects at images of documents]. Vestnik Brestskogo gosudarstvennogo tekhnicheskogo universiteta. Fizika, matematika, informatika [Bulletin of the Brest State Technical University. Physics, mathematics, Computer Science], 2017, vol. 5 (107), pp. 2–9 (in Russian).

Inception-v4, inception-ResNet and the impact of residual connections on learning / C. Szegedy [et al.] // Proc. of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, California, USA, 4–9 Febr. 2017. – San Francisco, 2017. – P. 4278–4284.

Szegedy C., Ioffe S., Vanhoucke V. Inception-v4, inception-ResNet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, California, USA, 4–9 February 2017. San Francisco, 2017, pp. 4278–4284.

The pascal Visual Object Classes (VOC) challenge / M. Everingham [et al.] // Intern. J. of Computer Vision. – 2010. – Vol. 88. – P. 303–338. https://doi.org/10.1007/s11263-009-0275-4

Everingham M., Van Gool L., Williams C., Winn J., Zisserman A. The pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, vol. 88, pp. 303–338. https://doi.org/10.1007/s11263-009-0275-4

The authors declare that there are no conflicts of interest present.