1,700 research outputs found
Fast algorithms and efficient GPU implementations for the Radon transform and the back-projection operator represented as convolution operators
The Radon transform and its adjoint, the back-projection operator, can both
be expressed as convolutions in log-polar coordinates. Hence, fast algorithms
for the application of the operators can be constructed by using FFT, if data
is resampled at log-polar coordinates. Radon data is typically measured on an
equally spaced grid in polar coordinates, and reconstructions are represented
(as images) in Cartesian coordinates. Therefore, in addition to FFT, several
steps of interpolation have to be conducted in order to apply the Radon
transform and the back-projection operator by means of convolutions.
Both the interpolation and the FFT operations can be efficiently implemented
on Graphical Processor Units (GPUs). For the interpolation, it is possible to
make use of the fact that linear interpolation is hard-wired on GPUs, meaning
that it has the same computational cost as direct memory access. Cubic order
interpolation schemes can be constructed by combining linear interpolation
steps which provides important computation speedup.
We provide details about how the Radon transform and the back-projection can
be implemented efficiently as convolution operators on GPUs. For large data
sizes, speedups of about 10 times are obtained in relation to the computational
times of other software packages based on GPU implementations of the Radon
transform and the back-projection operator. Moreover, speedups of more than a
1000 times are obtained against the CPU-implementations provided in the MATLAB
image processing toolbox
GPU Prefilter for Accurate Cubic B-spline Interpolation
Achieving accurate interpolation is an important requirement for many signal-processing applications. While nearest-neighbor and linear interpolation methods are popular due to their native GPU support, they unfortunately result in severe undesirable artifacts. Better interpolation methods are known but lack a native GPU support. Yet, a particularly attractive one is prefiltered cubic-spline interpolation. The signal it reconstructs from discrete samples has a much higher fidelity to the original data than what is achievable with nearest-neighbor and linear interpolation. At the same time, its computational load is moderate, provided a sequence of two operations is applied: first, prefilter the samples, and only then reconstruct the signal with the help of a B-spline basis. It has already been established in the literature that the reconstruction step can be implemented efficiently on a GPU. This article focuses on an efficient GPU implementation of the prefilter, on how to apply it to multidimensional samples (e.g. RGB color images), and on its performance aspect
Scalable GPU acceleration of b-spline signal processing operations
B-Splines are a useful tool in signal processing, and are widely used in the analysis of two and three-dimensional images. B-Splines provide a continuous representation of the signal, image, or volume, which is useful for interpolation, resampling, noise removal, and differentiation - all important steps in many signal processing algorithms. These splines are defined entirely by an array of coefficients that is roughly the same size as the original signal and of values in the same order of magnitude, making storage and representation trivial. What is not trivial, however, is the quick calculation and processing of those coefficients, especially for very large data. As technology improves in fields such as medical imaging, algorithms that use B-Splines will need to process increasingly higher resolution images and voxel volumes. New implementations are needed to make use of modern parallel architectures to keep these algorithms practical. This thesis presents a library for performing many common B-Splines operations in CUDA, the parallel programming framework for NVIDIA GPUs, and analyzes the considerations necessary when implementing a large-scale parallel version of such a well-established sequential algorithm. This library is meant to be used both by C++ programs as well as algorithms implemented in MATLAB without requiring significant changes. Significant speedups are obtained using this library to perform various common B-Spline image processing operations (as much as 30x for some), and the scalability limitations of the GPU implementation are addressed
Distributed-memory large deformation diffeomorphic 3D image registration
We present a parallel distributed-memory algorithm for large deformation
diffeomorphic registration of volumetric images that produces large isochoric
deformations (locally volume preserving). Image registration is a key
technology in medical image analysis. Our algorithm uses a partial differential
equation constrained optimal control formulation. Finding the optimal
deformation map requires the solution of a highly nonlinear problem that
involves pseudo-differential operators, biharmonic operators, and pure
advection operators both forward and back- ward in time. A key issue is the
time to solution, which poses the demand for efficient optimization methods as
well as an effective utilization of high performance computing resources. To
address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov
solver. Our algorithm integrates several components: a spectral discretization
in space, a semi-Lagrangian formulation in time, analytic adjoints, different
regularization functionals (including volume-preserving ones), a spectral
preconditioner, a highly optimized distributed Fast Fourier Transform, and a
cubic interpolation scheme for the semi-Lagrangian time-stepping. We
demonstrate the scalability of our algorithm on images with resolution of up to
on the "Maverick" and "Stampede" systems at the Texas Advanced
Computing Center (TACC). The critical problem in the medical imaging
application domain is strong scaling, that is, solving registration problems of
a moderate size of ---a typical resolution for medical images. We are
able to solve the registration problem for images of this size in less than
five seconds on 64 x86 nodes of TACC's "Maverick" system.Comment: accepted for publication at SC16 in Salt Lake City, Utah, USA;
November 201
Fast hyperbolic Radon transform represented as convolutions in log-polar coordinates
The hyperbolic Radon transform is a commonly used tool in seismic processing,
for instance in seismic velocity analysis, data interpolation and for multiple
removal. A direct implementation by summation of traces with different moveouts
is computationally expensive for large data sets. In this paper we present a
new method for fast computation of the hyperbolic Radon transforms. It is based
on using a log-polar sampling with which the main computational parts reduce to
computing convolutions. This allows for fast implementations by means of FFT.
In addition to the FFT operations, interpolation procedures are required for
switching between coordinates in the time-offset; Radon; and log-polar domains.
Graphical Processor Units (GPUs) are suitable to use as a computational
platform for this purpose, due to the hardware supported interpolation routines
as well as optimized routines for FFT. Performance tests show large speed-ups
of the proposed algorithm. Hence, it is suitable to use in iterative methods,
and we provide examples for data interpolation and multiple removal using this
approach.Comment: 21 pages, 10 figures, 2 table
- …