19 research outputs found

    Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

    Get PDF
    We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

    Adquisición y procesamiento de señales electromiográficas para el control de un vehículo virtual en tiempo real

    Get PDF
    This work presents the registration and classification of the electromyographic (EMG) signals of the lower extremities, specifically of the gross muscle, in order to control a virtual vehicle designed in Blender. The system has 4 channels, with a graphic interface, which allows the control of a virtual vehicle. For the processing of the signals, different mathematical tools were used such as: Fourier analysis and wavelet analysis. These techniques were used in order to compress data, obtain characteristic patterns in each set of signals and perform digital filtering. The control of the car consists of 4 commands such as: accelerate, stop, right turn and left turn, which are the basic instructions for the real operation of a car. The results showed that it is possible to use biological signals to perform virtual controls (video game). Likewise, it was verified that the parameterization found for each group of EMG signals was satisfactory, since the percentage of errors of the 4 variables studied was 0.04% for a total of 400 executions. This error percentage corroborates that the system has great potential for possible future applications.Este trabajo, se presenta el registro y clasificación de las señales electromiográficas (EMG) de las extremidades inferiores, específicamente del musculo basto, con el fin de controlar un vehículo virtual diseñado en Blender. El sistema tiene de 4 canales, con una interfaz gráfica, que permite el control de un vehículo virtual. Para el procesamiento de las señales, se utilizaron diferentes herramientas matemáticas tales como: análisis de Fourier y análisis wavelet. Estas técnicas se usaron con el objetivo de comprimir datos, obtener patrones característicos en cada conjunto de señales y realizar un filtrado digital. El control del automóvil consta de 4 comandos como: acelerar, detenerse, giro derecha y giro izquierda, las cuales son las instrucciones básicas para el manejo real de un automóvil. Los resultados mostraron que es posible usar señales biológicas para realizar controles virtuales (video juego). Así mismo, se verificó que la parametrizar encontrada de cada grupo de señales EMG, fue satisfactoria, ya que el porcentaje de errores de las 4 variables estudiadas fue del 0.04% para un total de 400 ejecuciones. Este porcentaje de error corrobora que el sistema tiene gran potencialidad para posibles aplicaciones futuras

    Framework for 4D medical data compression

    Get PDF
    U ovom radu predložen je novi programski okvir za kompresiju četvero-dimenzionalnih (4D) medicinskih podataka. Arhitektura ovog programskog okvira temelji se na različitim procedurama i algoritmima koji detektiraju vremenske i prostorne zalihosti u ulaznim 4D medicinskim podacima. Pokret kroz vrijeme analizira se pomoću vektora pomaka koji predstavljaju ulazne parametre za neuronske mreže koje se koriste za procjenu pokreta. Kombinacijom segmentacije, pronalaženja odgovarajućih blokova i predikcijom vektora pomaka, zajedno s ekspertnim znanjem moguće je optimirati performanse sustava. Frekvencijska svojstva se analiziraju proširenjem wavelet transformacije na tri dimenzije. Za mirne volumetrijske objekte, moguće je konstruirati različite wavelet pakete s različitim filtrima koji omogućavaju širok raspon analiza frekvencijskih zalihosti. Kombinacijom uklanjanja vremenskih i prostornih zalihosti moguće je postići vrlo visoke omjere kompresije.This work presents a novel framework for four-dimensional (4D) medical data compression architecture. This framework is based on different procedures and algorithms that detect time and spatial (frequency) redundancy in recorded 4D medical data. Motion in time is analyzed through the motion fields that produce input parameters for the neural network used for motion estimation. Combination of segmentation, block matching and motion field prediction along with expert knowledge are incorporated to achieve better performance. Frequency analysis is done through an extension of one dimensional wavelet transformation to three dimensions. For still volume objects different wavelet packets with different filter banks can be constructed, providing a wide range of frequency analysis. With combination of removing temporal and spatial redundancies, very high compression ratio is achieved

    Discreteness Effects in Lambda Cold Dark Matter Simulations: A Wavelet-Statistical View

    Full text link
    The effects of particle discreteness in N-body simulations of Lambda Cold Dark Matter (LambdaCDM) are still an intensively debated issue. In this paper we explore such effects, taking into account the scatter caused by the randomness of the initial conditions, and focusing on the statistical properties of the cosmological density field. For this purpose, we run large sets of LambdaCDM simulations and analyse them using a large variety of diagnostics, including new and powerful wavelet statistics. Among other facts, we point out (1) that dynamical evolution does not propagate discreteness noise up from the small scales at which it is introduced, and (2) that one should aim to satisfy the condition epsilon ~ 2d, where "epsilon" is the force resolution and "d" is the interparticle distance. We clarify what such a condition means, and how to implement it in modern cosmological codes.Comment: ApJ, in press. Minor changes to match the accepted versio

    Compression of image sequences in interactive medical teleconsultations

    Get PDF
    Interactive medical teleconsultations are an important tool in the modern medical practice. Their applications include remote diagnostics, conferences, workshops and classes for students. In many cases standard medium or low-end machines are employed and the teleconsultation systems must be able to provide high quality of user experience with very limited resources. Particularly problematic are large datasets, consisting of image sequences, which need to be accessed fluently. The main issue is insufficient internal memory, therefore proper compression methods are crucial. However, a scenario where image sequences are kept in a compressed format in the internal memory and decompressed on-the-fly when displayed, is difficult to implement due to performance issues. In this paper we present methods for both lossy and lossless compression of medical image sequences, which require only compatibility with Pixel Shader 2.0 standard, which is present even on relatively old, low-end devices. Based on the evaluation of quality, size reduction and performance, the methods are proved to be suitable and beneficial for the medical teleconsultation applications

    An investigation into combining both facial detection and landmark localisation into a unified procedure using GPU computing

    Get PDF
    This thesis describes the design and implementation of a unified framework for face detection and landmark alignment in arbitrary in the wild images. Traditionally, both of these problems have been addressed separately in literature with impressive results being recently reported in both of these fields. But, if one was to construct a pipeline consisting of a state-of-the-art face detection method followed by a state-of-the-art facial landmark localisation algorithm, the overall performance outcome would not be proficient enough to be used in high level algorithms such as face recognition and facial expression. This is because the accuracy produced by the face detector is not sufficiently high enough to initialise the landmark localisation algorithm. To address this aforementioned limitation, this thesis aims to propose an approach that combines both of these tasks into a single unified algorithm that can be run in real time, by utilising the parallel computing architecture of the graphics processing unit (GPU). This will be done by using a Cascaded-Regression (CR) algorithm in a sliding window fashion. The proposed system will exploit the CR algorithms ability to compute the 2D pose of a face from rough initial estimates, in order to generate a Hough- Transform voting scheme for detecting candidate faces and filtering out irrelevant background. The obtained detection surface will then be further refined using SVM to yield both face detections and the location of their parts. The proposed system for this thesis will be built within the MATLAB environment, using a MEX-file which will provide an interface to the proposed CUDA algorithm. The results of which, will be tested against current state-of-the-art methods for both face detection and landmark localisation. We evaluate performance on the most widely used data sets in face detection, namely annotated faces in-the-wild (AFW) (Zhu and Ramanan, 2012), Face Detection Dataset and Benchmark (FDDB) (Jain and Learned-Miller, 2010) and Caltech Occluded Faces in the Wild (COFW) (Burgos-Artizzu, Perona and Dollár, 2013). The empirical results demonstrate that the proposed unified framework achieves state-of-the-art performance in both face detection and facial alignment, and that our detector clearly outperforms all commercial and published methods by a margin of over 10% in detection accuracy on the AFW dataset

    GPU implementation of bitplane coding with parallel coefficient processing for high performance image compression

    Get PDF
    The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 x less energy for equivalent performance than state-of-the-art methods

    Implementation of the DWT in a GPU through a register-based strategy

    Get PDF
    The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with a larger register memory space and instructions for the communication of registers among threads. This facilitates a new programming strategy that utilizes registers for data sharing and reusing in detriment of the shared memory. Such a programming strategy can significantly improve the performance of applications that reuse data heavily. This paper presents a register-based implementation of the Discrete Wavelet Transform (DWT), the prevailing data decorrelation technique in the field of image coding. Experimental results indicate that the proposed method is, at least, four times faster than the best GPU implementation of the DWT found in the literature. Furthermore, theoretical analysis coincide with experimental tests in proving that the execution times achieved by the proposed implementation are close to the GPU's performance limits
    corecore