A way of entropy video coding based on extended instruction set SIMD AVX-512

Authors

DOI:

https://doi.org/10.18372/2073-4751.70.16841

Keywords:

SIMD, AV1, AVX-512, entropy coding, video, CODEC, video compression

Abstract

The purpose of this work is to reduce the time of entropy coding of video using the capabilities of processors with an extended instruction set of the AVX-512 type due to parallelization and the use of additional SIMD instructions compared to AVX2 and SSE. The paper investigates the AV1 video entropy decoding algorithm, both the existing scalar version and the vectorized version based on SIMD SSE and AVX2. The disadvantages of the above algorithms are analyzed, which lead to additional run time of processors and, as a result, to a decrease of their performance.

These versions do not utilize all the capabilities of the modern microarchitecture with support for the AVX-512 SIMD set. In this paper we show that by means of the SIMD AVX-512 instruction set, it is possible to effectively solve problems, in particular, left-hand packing, and reduce the number of vector and general-purpose registers used by the program, avoid unnecessary specialization of functions, which requires additional resource consumption, for instance, stack memory.

The positive results obtained during testing of the proposed method of entropy decoding based on SIMD AVX-512 in comparison with the scalar and vectorized versions of SIMD SSE and AVX2 are given. Disadvantages of microarchitecture with extended AVX512 instruction set and ways to solve them are considered. It is indicated that the above shortcomings can be corrected by choosing the correct entropy encoding parameters. The relative acceleration of the part of entropy decoding, which is responsible for updating the probabilities, works much faster, because it does not depend on the values ​​​​of the decoded symbol.

References

Videolan. Репозиторій dav1d. [Електронний ресурс]. – Режим доступу: https://code.videolan.org/videolan/dav1d.

Gottschlag, Mathias et al. Fair Scheduling for AVX2 and AVX-512 Workloads. USENIX Annual Technical Conference (2021). – Р. 745-758.

Teh, J. Hadarmard transform and sum of absolute difference improvement on high efficiency video coding using intel advanced vector extension-512. – 2018. – 22 р.

Gottschlag, Mathias and Frank Bellosa. Mechanism to Mitigate AVX-Induced Frequency Reduction. – 2018. – 12 р.

Lemire D. Avx-512: when and how to use these new instructions. [Електронний ресурс]. – Режим доступу: https://lemire.me/blog/2018/09/07/avx512-when-and-how-to-use-these-newinstructions/.

Kinsella R., MacNamara C., G. Tkachuk G. TECHNOLOGY GUIDE Intel Corporation. Intel® AVX-512 – Instruction Set for Packet Processing. [Електронний ресурс]. – Режим доступу: https://builders.intel.com/docs/networkbuilders/intel-avx-512-instruction-set-for-packet-processing-technology-guide-1617440657.pdf

Google. Репозиторій libaom. [Електронний ресурс]. – Режим доступу: https://aomedia.googlesource.com/aom/

Han, Jingning, Bohan Li, Debargha Mukherjee, Chiang Ching-Han, Cheng Chen, Hui Su, Sarah Parker, Urvang Joshi, Yue-Meng Chen, Yunqing Wang, Paul Wilkins, Yaowu Xu and Jim Bankoski. A Technical Overview of AV1. Proceedings of the IEEE 109 (2021). – Р. 1435-1462.

Stackoverflow. AVX2 what is the most efficient way to pack left based on a mask. [Електронний ресурс]. – Режим доступу: https://stackoverflow.com/questions/36932240/avx2-what-is-the-most-efficient-way-to-pack-left-based-on-a-mask

Published

2022-06-24

Issue

Section

Статті