Performance enhancement of RGB image convolution using convolution kernel clustering algorithm for ARM64 processor architecture

Authors

DOI:

https://doi.org/10.18372/2073-4751.81.20144

Keywords:

convolution operation, NEON64, ARM64, SIMD optimization, vectorization, RGB images, convolution kernel clustering, digital image processing, sparse matrices, OpenCV

Abstract

The paper presents a method for improving the performance of RGB image convolution operation on the ARM64 platform using a convolution kernel element clustering algorithm. The proposed approach is based on vectorization of computations using NEON64 SIMD instructions and grouping of non-zero kernel elements with the same sign for efficient skipping of operations with zero elements. A mathematical model of vectorized convolution operation has been developed, which takes into account the specifics of sparse convolution kernel matrices. Experimental study on the Orange Pi 5 Pro platform demonstrated significant acceleration compared to the cv::filter2D() function of the OpenCV library: for medium-sized kernels (7×7 – 11×11), an acceleration of 5.0–9.7 times was achieved, for large kernels (12×12 – 15×15) – 1.7–5.5 times. The proposed method is particularly effective for processing high-resolution images and can be applied in real-time systems on single-board computers with limited computational resources.

References

Приставка П. О., Шевченко А. К. Дослідження реалізації лінійного оператора згортки цифрового зображення при 16-бітних обчисленнях. Проблеми програмування. 2016. № 2-3. С. 207–217. DOI: 10.15421/431608.

Shevchenko A., Tymchyshyn V. A SIMD-based approach to the enhancement of convolution operation performance. International Workshop on Conflict Management in Global Information Networks (CMiGIN 2019) : proceedings, Lviv, Ukraine, November 29, 2019 / 2019. P. 447–458. URL: https://ceur-ws.org/Vol-2588/paper37.pdf

Shevchenko A., Prystavka P., Tymchyshyn V. Research on Possible Convolution Operation Speed Enhancement via AArch64 SIMD. Lecture Notes on Data Engineering and Communications Technologies. Vol. 134. Advances in Computer Science for Engineering and Education / ed. by Z. Hu et al, 2022. P. 61–75. DOI: doi.org/10.1007/978-3-031-04812-8_6.

Fog A. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms. Copenhagen : Copenhagen University College of Engineering, 2024. URL : https://www.agner.org/optimize/optimizing_cpp.pdf (access date: 26.05.2025.)

Universal intrinsics / OpenCV 4.x Main Documentation. URL: https://docs.opencv.org/4.x/d6/dd1/tutorial_univ_intrin.html. (access date 26.05.2025.)

HAL (Hardware Acceleration Layer) Explanations / OpenCV GSoC 2016 ideas ; GitHub. URL: https://github.com/opencv/opencv/wiki/GSoC_2016_ideas_HAL_Explanations (access date: 26.05.2025.)

Published

2025-06-01

Issue

Section

Статті