Search CORE

4,356 research outputs found

The Parallel Algorithm for the 2-D Discrete Wavelet Transform

Author: Barina David
Kleparnik Petr
Kula Michal
Najman Pavel
Zemcik Pavel
Publication venue
Publication date: 26/09/2017
Field of study

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.Comment: accepted for publication at ICGIP 201

arXiv.org e-Print Archive

Crossref

Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

Author: Bernabé Gregorio
Publication venue: University of Granada-University of Cadiz
Publication date: 01/01/2015
Field of study

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

Portal de revistas de la Universidad de Granada

DIALNET

Compressing Inertial Motion Data in Wireless Sensing Systems – An Initial Experiment

Author: Cheng L
Cheng Z
Fan FY
Hailes S
Hang D
Yang Y
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2008
Field of study

The use of wireless inertial motion sensors, such as accelerometers, for supporting medical care and sport’s training, has been under investigation in recent years. As the number of sensors (or their sampling rates) increases, compressing data at source(s) (i.e. at the sensors), i.e. reducing the quantity of data that needs to be transmitted between the on-body sensors and the remote repository, would be essential especially in a bandwidth-limited wireless environment. This paper presents a set of compression experiment results on a set of inertial motion data collected during running exercises. As a starting point, we selected a set of common compression algorithms to experiment with. Our results show that, conventional lossy compression algorithms would achieve a desirable compression ratio with an acceptable time delay. The results also show that the quality of the decompressed data is within acceptable range

UCL Discovery

Near real-time early cancer detection using a graphics processing unit

Author: Helms Jason
Publication venue: EWU Digital Commons
Publication date: 01/01/2016
Field of study

Automatically detecting early cancer using medical images is challenging, yet very crucial to help save millions of lives in the early stages of cancer. In this work, we improved a method that was originally developed by Yamaguchi et al. from the Saga University in Saga Japan. The original method would first decompose the endoscopic image into four color elements: red, green, blue and luminance (RGBL). Next each component is again decomposed to non-overlapping blocks of smaller images. Each smaller image undergoes two phases of DWT(s) and finally the Fractal Dimension (FD) is calculated per smaller image and abnormal regions are detectable. Our proposed method not only used GPU technology to speed up processing, this method also applied edge enhancement via Gaussian Fuzzy Edge Enhancement. After edge enhancement, multiple thresholds (or tuning variables) were identified and adjusted to reduce computational requirements, decrease false positives and increase the accuracy of detecting early cancer. Most lesions where a physician had manually indicated that could be an area of concern were detected quickly, less than four seconds, which is roughly 25x quicker than the existing work. The false positive rate was reduced but still needs improvement. In the future, a Support Vector Machine (SVM) would be an ideal solutions to reduce the false positive rate while also aiding in increasing detection and SVM technology has been implemented on the GPU. Once a technology, like a SVM, is implemented with better results, video processing will be the nearing the final step to \u27Near Real Time Automatic Detection of Early Esophageal Cancer from an Endoscopic Image\u27 --Leaf iv

Eastern Washington University: EWU Digital Commons

FPGA-based module for SURF extraction

Author: H Bay
Jan Šváb
K Mikolajczyk
Libor Přeučil
Petr Čížek
Sol Pedre
Tomáš Krajník
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2014
Field of study

We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots

University of Lincoln Institutional Repository

Crossref

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

Author: Elsner Franz
Wandelt Benjamin D.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2011
Field of study

We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact kernels, computation of fine-scale wavelet transformations, and optimal filtering for the detection of point sources. Our algorithm works for any pixelization where pixels are grouped into isolatitude rings. Even for kernels that are not bandwidth limited, ringing features are completely absent on an ECP grid. We demonstrate that they can be highly suppressed on the popular HEALPix pixelization, for which we develop a freely available implementation of the algorithm. As an example application, we show that running on a high-end consumer graphics card our method speeds up beam convolution for simulations of a characteristic Planck high frequency instrument channel by two orders of magnitude compared to the commonly used HEALPix implementation on one CPU core while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics. Replaced to match published version. Code can be downloaded at https://github.com/elsner/arkco

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HAL-INSU