Search CORE

895 research outputs found

Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA

Author: Jalba Andrei C.
Laan Wladimir J. van der
Roerdink Jos B.T.M.
Publication venue: IEEE (The Institute of Electrical and Electronics Engineers)
Publication date: 01/01/2009
Field of study

ARTS repository - University of Groningen

Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA

Author: Jalba Andrei C.
Laan Wladimir J. van der
Roerdink Jos B.T.M.
Publication venue: IEEE (The Institute of Electrical and Electronics Engineers)
Publication date: 01/01/2009
Field of study

Proceedings - University of Groningen

GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

Author: Aulí Llinàs Francesc
Bartrina Rapesta Joan
de Cea Dominguez Carlos
Moure López Juan Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

Diposit Digital de Documents de la UAB

Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

Author: Bernabé Gregorio
Publication venue: University of Granada-University of Cadiz
Publication date: 01/01/2015
Field of study

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

Portal de revistas de la Universidad de Granada

DIALNET

Discrete Wavelet Transformation Implementation in GPU through Register Based Strategy

Author: Hemkant Balasaheb Gangurde, M. U. Kharat
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

The significant architectural changes made by Nvidia during the launch of Kepler architecture in 2012, upgraded its GPUs with greater register memory and rich instructions set to have communication between registers through available threads. This created a potential for new programming approach which uses registers for sharing and reusing of data in the context of the shared memory. This kind of approach can considerably improve the performance of applications which reuses implied data heavily. This work is based upon of register-based implementation of the Discrete Wavelet Transform (DWT) with the help of CUDA and openCV. DWT is the data decorrelation approach in the area of video and image coding. Results of this particular approach indicate that this technique performs at least four times better than the best GPU implementation of the DWT in past. Experimental tests also prove that this approach shows the performance close to the GPUs performance limits

International Journal on Recent and Innovation Trends in Computing and Communication

Complexity scalable bitplane image coding with parallel coefficient processing

Author: Aulí Llinàs Francesc
Bartrina Rapesta Joan
de Cea Dominguez Carlos
Moure López Juan Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

Diposit Digital de Documents de la UAB