4,356 research outputs found

    The Parallel Algorithm for the 2-D Discrete Wavelet Transform

    Full text link
    The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.Comment: accepted for publication at ICGIP 201

    Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

    Get PDF
    We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

    Compressing Inertial Motion Data in Wireless Sensing Systems – An Initial Experiment

    Get PDF
    The use of wireless inertial motion sensors, such as accelerometers, for supporting medical care and sport’s training, has been under investigation in recent years. As the number of sensors (or their sampling rates) increases, compressing data at source(s) (i.e. at the sensors), i.e. reducing the quantity of data that needs to be transmitted between the on-body sensors and the remote repository, would be essential especially in a bandwidth-limited wireless environment. This paper presents a set of compression experiment results on a set of inertial motion data collected during running exercises. As a starting point, we selected a set of common compression algorithms to experiment with. Our results show that, conventional lossy compression algorithms would achieve a desirable compression ratio with an acceptable time delay. The results also show that the quality of the decompressed data is within acceptable range

    Near real-time early cancer detection using a graphics processing unit

    Get PDF
    Automatically detecting early cancer using medical images is challenging, yet very crucial to help save millions of lives in the early stages of cancer. In this work, we improved a method that was originally developed by Yamaguchi et al. from the Saga University in Saga Japan. The original method would first decompose the endoscopic image into four color elements: red, green, blue and luminance (RGBL). Next each component is again decomposed to non-overlapping blocks of smaller images. Each smaller image undergoes two phases of DWT(s) and finally the Fractal Dimension (FD) is calculated per smaller image and abnormal regions are detectable. Our proposed method not only used GPU technology to speed up processing, this method also applied edge enhancement via Gaussian Fuzzy Edge Enhancement. After edge enhancement, multiple thresholds (or tuning variables) were identified and adjusted to reduce computational requirements, decrease false positives and increase the accuracy of detecting early cancer. Most lesions where a physician had manually indicated that could be an area of concern were detected quickly, less than four seconds, which is roughly 25x quicker than the existing work. The false positive rate was reduced but still needs improvement. In the future, a Support Vector Machine (SVM) would be an ideal solutions to reduce the false positive rate while also aiding in increasing detection and SVM technology has been implemented on the GPU. Once a technology, like a SVM, is implemented with better results, video processing will be the nearing the final step to \u27Near Real Time Automatic Detection of Early Esophageal Cancer from an Endoscopic Image\u27 --Leaf iv

    FPGA-based module for SURF extraction

    Get PDF
    We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots

    ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

    Full text link
    We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact kernels, computation of fine-scale wavelet transformations, and optimal filtering for the detection of point sources. Our algorithm works for any pixelization where pixels are grouped into isolatitude rings. Even for kernels that are not bandwidth limited, ringing features are completely absent on an ECP grid. We demonstrate that they can be highly suppressed on the popular HEALPix pixelization, for which we develop a freely available implementation of the algorithm. As an example application, we show that running on a high-end consumer graphics card our method speeds up beam convolution for simulations of a characteristic Planck high frequency instrument channel by two orders of magnitude compared to the commonly used HEALPix implementation on one CPU core while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics. Replaced to match published version. Code can be downloaded at https://github.com/elsner/arkco
    • 

    corecore