Search CORE

6 research outputs found

Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

Author: Bernabé Gregorio
Publication venue: University of Granada-University of Cadiz
Publication date: 01/01/2015
Field of study

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

Portal de revistas de la Universidad de Granada

DIALNET

Compression of image sequences in interactive medical teleconsultations

Author: Czekierda Łukasz
Malawski Filip
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2017
Field of study

Interactive medical teleconsultations are an important tool in the modern medical practice. Their applications include remote diagnostics, conferences, workshops and classes for students. In many cases standard medium or low-end machines are employed and the teleconsultation systems must be able to provide high quality of user experience with very limited resources. Particularly problematic are large datasets, consisting of image sequences, which need to be accessed fluently. The main issue is insufficient internal memory, therefore proper compression methods are crucial. However, a scenario where image sequences are kept in a compressed format in the internal memory and decompressed on-the-fly when displayed, is difficult to implement due to performance issues. In this paper we present methods for both lossy and lossless compression of medical image sequences, which require only compatibility with Pixel Shader 2.0 standard, which is present even on relatively old, low-end devices. Based on the evaluation of quality, size reduction and performance, the methods are proved to be suitable and beneficial for the medical teleconsultation applications

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Crossref

Massively parallel non-stationary EEG data processing on GPGPU platforms with Morlet continuous wavelet transform

Author: A Klein
C Tenllado
C Torrence
J Franco
J Nickolls
J Polygiannakis
JP Lachaux
K Gurley
M Fligge
MP SouzaEcher
MT Akhtar
P Kumar
P Pioft
RW Johnson
S Erol
SG Park
TT Wong
WJ Laan van der
X Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

GPU implementation of bitplane coding with parallel coefficient processing for high performance image compression

Author: Aulí Llinàs Francesc
Enfedaque Montes Pablo
Moure López Juan Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 x less energy for equivalent performance than state-of-the-art methods

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Diposit Digital de Documents de la UAB

Implementation of the DWT in a GPU through a register-based strategy

Author: Aulí Llinàs Francesc
Enfedaque Montes Pablo
Moure López Juan Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger register memory space and instructions for the communication of registers among threads. This facilitates a new programming strategy that utilizes registers for data sharing and reusing in detriment of the shared memory. Such a programming strategy can significantly improve the performance of applications that reuse data heavily. This paper presents a register-based implementation of the Discrete Wavelet Transform (DWT), the prevailing data decorrelation technique in the field of image coding. Experimental results indicate that the proposed method is, at least, four times faster than the best GPU implementation of the DWT found in the literature. Furthermore, theoretical analysis coincide with experimental tests in proving that the execution times achieved by the proposed implementation are close to the GPU's performance limits

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Diposit Digital de Documents de la UAB

Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs

Author: Mallat
Owens
Tenllado
Wong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref