52,390 research outputs found

    Secret Key Cryptography Using Graphics Cards

    Get PDF
    One frequently cited reason for the lack of wide deployment of cryptographic protocols is the (perceived) poor performance of the algorithms they employ and their impact on the rest of the system. Although high-performance dedicated cryptographic accelerator cards have been commercially available for some time, market penetration remains low. We take a different approach, seeking to exploit {\it existing system resources,} such as Graphics Processing Units (GPUs) to accelerate cryptographic processing. We exploit the ability for GPUs to simultaneously process large quantities of pixels to offload cryptographic processing from the main processor. We demonstrate the use of GPUs for stream ciphers, which can achieve 75\% the performance of a fast CPU. We also investigate the use of GPUs for block ciphers, discuss operations that make certain ciphers unsuitable for use with a GPU, and compare the performance of an OpenGL-based implementation of AES with implementations utilizing general CPUs. In addition to offloading system resources, the ability to perform encryption and decryption within the GPU has potential applications in image processing by limiting exposure of the plaintext to within the GPU

    Parallelization of Non-Rigid Image Registration

    Get PDF
    Non-rigid image registration finds use in a wide range of medical applications ranging from diagnostics to minimally invasive image-guided interventions. Automatic non-rigid image registration algorithms are computationally intensive in that they can take hours to register two images. Although hierarchical volume subdivision-based algorithms are inherently faster than other non-rigid registration algorithms, they can still take a long time to register two images. We show a parallel implementation of one such previously reported and well tested algorithm on a cluster of thirty two processors which reduces the registration time from hours to a few minutes. Mutual information (MI) is one of the most commonly used image similarity measures used in medical image registration and also in the mentioned algorithm. In addition to parallel implementation, we propose a new concept based on bit-slicing to accelerate computation of MI on the cluster and, more generally, on any parallel computing platform such as the Graphics processor units (GPUs). GPUs are becoming increasingly common for general purpose computing in the area of medical imaging as they can execute algorithms faster by leveraging the parallel processing power they offer. However, the standard implementation of MI does not map well to the GPU architecture, leading earlier investigators to compute only an inexact version of MI on the GPU to achieve speedup. The bit-slicing technique we have proposed enables us to demonstrate an exact implementation of MI on the GPU without adversely affecting the speedup

    Efficient parallelization on GPU of an image smoothing method based on a variational model

    Get PDF
    Medical imaging is fundamental for improvements in diagnostic accuracy. However, noise frequently corrupts the images acquired, and this can lead to erroneous diagnoses. Fortunately, image preprocessing algorithms can enhance corrupted images, particularly in noise smoothing and removal. In the medical field, time is always a very critical factor, and so there is a need for implementations which are fast and, if possible, in real time. This study presents and discusses an implementation of a highly efficient algorithm for image noise smoothing based on general purpose computing on graphics processing units techniques. The use of these techniques facilitates the quick and efficient smoothing of images corrupted by noise, even when performed on large-dimensional data sets. This is particularly relevant since GPU cards are becoming more affordable, powerful and common in medical environments

    Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO

    Full text link
    The recent introduction of powerful embedded graphics processing units (GPUs) has allowed for unforeseen improvements in real-time computer vision applications. It has enabled algorithms to run onboard, well above the standard video rates, yielding not only higher information processing capability, but also reduced latency. This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms in the field of visual-inertial odometry (VIO). While most steps of a VIO pipeline work on visual features, they rely on image data for detection and tracking, of which both steps are well suited for parallelization. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency. Our work first revisits the problem of non-maxima suppression for feature detection specifically on GPUs, and proposes a solution that selects local response maxima, imposes spatial feature distribution, and extracts features simultaneously. Our second contribution introduces an enhanced FAST feature detector that applies the aforementioned non-maxima suppression method. Finally, we compare our method to other state-of-the-art CPU and GPU implementations, where we always outperform all of them in feature tracking and detection, resulting in over 1000fps throughput on an embedded Jetson TX2 platform. Additionally, we demonstrate our work integrated in a VIO pipeline achieving a metric state estimation at ~200fps.Comment: IEEE International Conference on Intelligent Robots and Systems (IROS), 2020. Open-source implementation available at https://github.com/uzh-rpg/vili

    GPU implementation of bitplane coding with parallel coefficient processing for high performance image compression

    Get PDF
    The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 x less energy for equivalent performance than state-of-the-art methods

    Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

    Get PDF
    Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform

    GPU acceleration of brain image proccessing

    Get PDF
    Durante los últimos años se ha venido demostrando el alto poder computacional que ofrecen las GPUs a la hora de resolver determinados problemas. Al mismo tiempo, existen campos en los que no es posible beneficiarse completamente de las mejoras conseguidas por los investigadores, debido principalmente a que los tiempos de ejecución de las aplicaciones llegan a ser extremadamente largos. Este es por ejemplo el caso del registro de imágenes en medicina. A pesar de que se han conseguido aceleraciones sobre el registro de imágenes, su uso en la práctica clínica es aún limitado. Entre otras cosas, esto se debe al rendimiento conseguido. Por lo tanto se plantea como objetivo de este proyecto, conseguir mejorar los tiempos de ejecución de una aplicación dedicada al resgitro de imágenes en medicina, con el fin de ayudar a aliviar este problema
    • …
    corecore