1,700 research outputs found

    Fast algorithms and efficient GPU implementations for the Radon transform and the back-projection operator represented as convolution operators

    Full text link
    The Radon transform and its adjoint, the back-projection operator, can both be expressed as convolutions in log-polar coordinates. Hence, fast algorithms for the application of the operators can be constructed by using FFT, if data is resampled at log-polar coordinates. Radon data is typically measured on an equally spaced grid in polar coordinates, and reconstructions are represented (as images) in Cartesian coordinates. Therefore, in addition to FFT, several steps of interpolation have to be conducted in order to apply the Radon transform and the back-projection operator by means of convolutions. Both the interpolation and the FFT operations can be efficiently implemented on Graphical Processor Units (GPUs). For the interpolation, it is possible to make use of the fact that linear interpolation is hard-wired on GPUs, meaning that it has the same computational cost as direct memory access. Cubic order interpolation schemes can be constructed by combining linear interpolation steps which provides important computation speedup. We provide details about how the Radon transform and the back-projection can be implemented efficiently as convolution operators on GPUs. For large data sizes, speedups of about 10 times are obtained in relation to the computational times of other software packages based on GPU implementations of the Radon transform and the back-projection operator. Moreover, speedups of more than a 1000 times are obtained against the CPU-implementations provided in the MATLAB image processing toolbox

    GPU Prefilter for Accurate Cubic B-spline Interpolation

    Get PDF
    Achieving accurate interpolation is an important requirement for many signal-processing applications. While nearest-neighbor and linear interpolation methods are popular due to their native GPU support, they unfortunately result in severe undesirable artifacts. Better interpolation methods are known but lack a native GPU support. Yet, a particularly attractive one is prefiltered cubic-spline interpolation. The signal it reconstructs from discrete samples has a much higher fidelity to the original data than what is achievable with nearest-neighbor and linear interpolation. At the same time, its computational load is moderate, provided a sequence of two operations is applied: first, prefilter the samples, and only then reconstruct the signal with the help of a B-spline basis. It has already been established in the literature that the reconstruction step can be implemented efficiently on a GPU. This article focuses on an efficient GPU implementation of the prefilter, on how to apply it to multidimensional samples (e.g. RGB color images), and on its performance aspect

    Scalable GPU acceleration of b-spline signal processing operations

    Get PDF
    B-Splines are a useful tool in signal processing, and are widely used in the analysis of two and three-dimensional images. B-Splines provide a continuous representation of the signal, image, or volume, which is useful for interpolation, resampling, noise removal, and differentiation - all important steps in many signal processing algorithms. These splines are defined entirely by an array of coefficients that is roughly the same size as the original signal and of values in the same order of magnitude, making storage and representation trivial. What is not trivial, however, is the quick calculation and processing of those coefficients, especially for very large data. As technology improves in fields such as medical imaging, algorithms that use B-Splines will need to process increasingly higher resolution images and voxel volumes. New implementations are needed to make use of modern parallel architectures to keep these algorithms practical. This thesis presents a library for performing many common B-Splines operations in CUDA, the parallel programming framework for NVIDIA GPUs, and analyzes the considerations necessary when implementing a large-scale parallel version of such a well-established sequential algorithm. This library is meant to be used both by C++ programs as well as algorithms implemented in MATLAB without requiring significant changes. Significant speedups are obtained using this library to perform various common B-Spline image processing operations (as much as 30x for some), and the scalability limitations of the GPU implementation are addressed

    Distributed-memory large deformation diffeomorphic 3D image registration

    Full text link
    We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the solution of a highly nonlinear problem that involves pseudo-differential operators, biharmonic operators, and pure advection operators both forward and back- ward in time. A key issue is the time to solution, which poses the demand for efficient optimization methods as well as an effective utilization of high performance computing resources. To address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov solver. Our algorithm integrates several components: a spectral discretization in space, a semi-Lagrangian formulation in time, analytic adjoints, different regularization functionals (including volume-preserving ones), a spectral preconditioner, a highly optimized distributed Fast Fourier Transform, and a cubic interpolation scheme for the semi-Lagrangian time-stepping. We demonstrate the scalability of our algorithm on images with resolution of up to 102431024^3 on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center (TACC). The critical problem in the medical imaging application domain is strong scaling, that is, solving registration problems of a moderate size of 2563256^3---a typical resolution for medical images. We are able to solve the registration problem for images of this size in less than five seconds on 64 x86 nodes of TACC's "Maverick" system.Comment: accepted for publication at SC16 in Salt Lake City, Utah, USA; November 201

    Fast hyperbolic Radon transform represented as convolutions in log-polar coordinates

    Full text link
    The hyperbolic Radon transform is a commonly used tool in seismic processing, for instance in seismic velocity analysis, data interpolation and for multiple removal. A direct implementation by summation of traces with different moveouts is computationally expensive for large data sets. In this paper we present a new method for fast computation of the hyperbolic Radon transforms. It is based on using a log-polar sampling with which the main computational parts reduce to computing convolutions. This allows for fast implementations by means of FFT. In addition to the FFT operations, interpolation procedures are required for switching between coordinates in the time-offset; Radon; and log-polar domains. Graphical Processor Units (GPUs) are suitable to use as a computational platform for this purpose, due to the hardware supported interpolation routines as well as optimized routines for FFT. Performance tests show large speed-ups of the proposed algorithm. Hence, it is suitable to use in iterative methods, and we provide examples for data interpolation and multiple removal using this approach.Comment: 21 pages, 10 figures, 2 table
    corecore