Search CORE

178 research outputs found

Performance engineering for HEVC transform and quantization kernel on GPUs

Author: Alen Duspara
Igor Piljić
Leon Dragić
Mario Kovač
Mate Čobrnić
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

Continuous growth of video traffic and video services, especially in the field of high resolution and high-quality video content, places heavy demands on video coding and its implementations. High Efficiency Video Coding (HEVC) standard doubles the compression efficiency of its predecessor H.264/AVC at the cost of high computational complexity. To address those computing issues high-performance video processing takes advantage of heterogeneous multiprocessor platforms. In this paper, we present a highly performance-optimized HEVC transform and quantization kernel with all-zero-block (AZB) identification designed for execution on a Graphics Processor Unit (GPU). Performance optimization strategy involved all three aspects of parallel design, exposing as much of the application’s intrinsic parallelism as possible, exploitation of high throughput memory and efficient instruction usage. It combines efficient mapping of transform blocks to thread-blocks and efficient vectorized access patterns to shared memory for all transform sizes supported in the standard. Two different GPUs of the same architecture were used to evaluate proposed implementation. Achieved processing times are 6.03 and 23.94 ms for DCI 4K and 8K Full Format, respectively. Speedup factors compared to CPU, cuBLAS and AVX2 implementations are up to 80, 19 and 4 times respectively. Proposed implementation outperforms previous work 1.22 times

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

3D high definition video coding on a GPU-based heterogeneous system

Author: Claver Jose M
De Cock Jan
Fernandez-Escribano Gerardo
Martinez Jose Luis
Pieters Bart
Rodriguez-Sanchez Rafael
Sanchez Jose L
Van de Walle Rik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the encoding time for different stereo high definition sequences. Speed-up values of up to 90× were obtained when compared with the reference encoder on the same platform. Moreover, the proposed algorithm also provides a more energy-efficient approach and hence requires less energy than the sequential reference algorith

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Repositori Institucional de la Universitat Jaume I

Performance Analysis of OpenCL and CUDA Programming Models for the High Efficiency Video Coding

Author: Bahba Asma
Bouaafia Soulef
Khemiri Randa
Nasr Maha
Sayadi Fatma Ezahra
Publication venue: 'IntechOpen'
Publication date: 19/10/2021
Field of study

In Motion estimation (ME), the block matching algorithms have a great potential of parallelism. This process of the best match is performed by computing the similarity for each block position inside the search area, using a similarity metric, such as Sum of Absolute Differences (SAD). It is used in the various steps of motion estimation algorithms. Moreover, it can be parallelized using Graphics Processing Unit (GPU) since the computation algorithm of each block pixels is similar, thus offering better results. In this work a fixed OpenCL code was performed firstly on several architectures as CPU and GPU, secondly a parallel GPU-implementation was proposed with CUDA and OpenCL for the SAD process using block of sizes from 4x4 to 64x64. A comparative study established between execution time on GPU on the same video sequence. The experimental results indicated that GPU OpenCL execution time was better than that of CUDA times with performance ratio that reached the double

IntechOpen

On the use of deep learning and parallelism techniques to signifcantly reduce the HEVC intra‑coding time

Author: Galiano Vicente
López Granado Otoniel Mario
Martínez-Rach Miguel Onofre
Migallon Hector
Perez Malumbres Manuel
Publication venue: Springer
Publication date: 01/08/2022
Field of study

It is well-known that each new video coding standard signifcantly increases in computational complexity with respect to previous standards, and this is particularly true for the HEVC and VVC video coding standards. The development of techniques for reducing the required complexity without afecting the rate/distortion (R/D) performance is therefore always a topic of intense research interest. In this paper, we propose a combination of two powerful techniques, deep learning and parallel computing, to signifcantly reduce the complexity of the HEVC encoding engine. Our experimental results show that a combination of deep learning to reduce the CTU partitioning complexity with parallel strategies based on frame partitioning is able to achieve speedups of up to 26× when 16 threads are used. The R/D penalty in terms of the BD-BR metric depends on the video content, the compression rate and the number of OpenMP threads, and was consistently between 0.35 and 10% for the video sequence test set used in our experiment

RediUMH (Universidad Miguel Hernández)

Image and Video Coding Techniques for Ultra-low Latency

Author: Jääskeläinen Pekka
Mäkitalo Markku
Vanne Jarno
Žádník Jakub
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/02/2022
Field of study

The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

Information fusion based techniques for HEVC

Author: Botella Guillermo
Del Barrio A. A.
Fernández D. G.
Grecos Christos
Meyer-Baese Anke
Meyer-Baese Uwe
Publication venue: ScholarWorks@CWU
Publication date: 09/04/2017
Field of study

Aiming at the conflict circumstances of multi-parameter H.265/HEVC encoder system, the present paper introduces the analysis of many optimizations\u27 set in order to improve the trade-off between quality, performance and power consumption for different reliable and accurate applications. This method is based on the Pareto optimization and has been tested with different resolutions on real-time encoders

ScholarWorks at Central Washington University

TRAP

The impact of Tiles on video coding performance: a case study on HEVC and AV1 video coding standards

Author: Πανάγου Ναταλία
Publication venue
Publication date: 01/01/2019
Field of study

University of Thessaly Institutional Repository

Design And Implementation Of Fast Motion Estimation In Modern Video Compression On GPU

Author: Yi Zhaohua
Publication venue: eGrove
Publication date: 01/01/2015
Field of study

Motion estimation is the most compute expensive part of high definition video compression. It accounts for more than 50\% of overall execution. Therefore, improving the performance of motion estimation can make significant impact on the overall performance of video compression. The performance of motion estimation can be improved in two aspects: algorithm and implementation. This thesis touches both aspects. We first propose an innovative motion estimation algorithm by replacing the traditional block matching method which comparing blocks pixel by pixel with a brand new method which based on lbp (local binary pattern) code. Our new method first encodes the original video frames into lbp code and then compares the blocks only using the lbp code. Our algorithm reduces the amount of computation significantly by avoiding many pixel by pixel comparisons present in traditional block matching approaches. Using public benchmarks our experiments show our proposed motion estimation algorithm runs 5 times faster than a traditional algorithm. Furthermore, we accelerate our proposed algorithm on gpus. Motion estimation processes of all blocks are offloaded to gpu and accelerated in parallel. Our gpu implementation runs 9 times faster than cpu implementation

eGrove (Univ. of Mississippi)