Search CORE

2 research outputs found

The Parallel Algorithm for the 2-D Discrete Wavelet Transform

Author: Barina David
Kleparnik Petr
Kula Michal
Najman Pavel
Zemcik Pavel
Publication venue
Publication date: 26/09/2017
Field of study

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.Comment: accepted for publication at ICGIP 201

arXiv.org e-Print Archive

Crossref

Accelerating discrete wavelet transforms on parallel architectures

Author: Bařina David
Kula Michal
Matýšek Michal
Zemčík Pavel
Publication venue: Václav Skala - UNION Agency
Publication date: 01/01/2017
Field of study

The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be merged into non-separable units, which halves the number of steps. In addition, we introduce an optional optimization approach leading to a reduction in the number of arithmetic operations. The discussed schemes were adapted on the OpenCL framework and pixel shaders, and then evaluated using GPUs of two biggest vendors. We demonstrate the performance of the proposed non-separable methods by comparison with existing separable schemes. The non-separable schemes outperform their separable counterparts on numerous setups, especially considering the pixel shaders

arXiv.org e-Print Archive

University of West Bohemia Digital Library

DSpace at University of West Bohemia