3,160 research outputs found
GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization
X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical
concern in most image guided radiation therapy procedures. It is the goal of
this paper to develop a fast GPU-based algorithm to reconstruct high quality
CBCT images from undersampled and noisy projection data so as to lower the
imaging dose. For this purpose, we have developed an iterative tight frame (TF)
based CBCT reconstruction algorithm. A condition that a real CBCT image has a
sparse representation under a TF basis is imposed in the iteration process as
regularization to the solution. To speed up the computation, a multi-grid
method is employed. Our GPU implementation has achieved high computational
efficiency and a CBCT image of resolution 512\times512\times70 can be
reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom
and a physical Catphan phantom. It is found that our TF-based algorithm is able
to reconstrct CBCT in the context of undersampling and low mAs levels. We have
also quantitatively analyzed the reconstructed CBCT image quality in terms of
modulation-transfer-function and contrast-to-noise ratio under various scanning
conditions. The results confirm the high CBCT image quality obtained from our
TF algorithm. Moreover, our algorithm has also been validated in a real
clinical context using a head-and-neck patient case. Comparisons of the
developed TF algorithm and the current state-of-the-art TV algorithm have also
been made in various cases studied in terms of reconstructed image quality and
computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio
CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data
This paper presents a novel method for ground segmentation in Velodyne point
clouds. We propose an encoding of sparse 3D data from the Velodyne sensor
suitable for training a convolutional neural network (CNN). This general
purpose approach is used for segmentation of the sparse point cloud into ground
and non-ground points. The LiDAR data are represented as a multi-channel 2D
signal where the horizontal axis corresponds to the rotation angle and the
vertical axis the indexes channels (i.e. laser beams). Multiple topologies of
relatively shallow CNNs (i.e. 3-5 convolutional layers) are trained and
evaluated using a manually annotated dataset we prepared. The results show
significant improvement of performance over the state-of-the-art method by
Zhang et al. in terms of speed and also minor improvements in terms of
accuracy.Comment: ICRA 2018 submissio
Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging
Many graphics and vision problems can be expressed as non-linear least
squares optimizations of objective functions over visual data, such as images
and meshes. The mathematical descriptions of these functions are extremely
concise, but their implementation in real code is tedious, especially when
optimized for real-time performance on modern GPUs in interactive applications.
In this work, we propose a new language, Opt (available under
http://optlang.org), for writing these objective functions over image- or
graph-structured unknowns concisely and at a high level. Our compiler
automatically transforms these specifications into state-of-the-art GPU solvers
based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate
different variations of the solver, so users can easily explore tradeoffs in
numerical precision, matrix-free methods, and solver approaches. In our
results, we implement a variety of real-world graphics and vision applications.
Their energy functions are expressible in tens of lines of code, and produce
highly-optimized GPU solver implementations. These solver have performance
competitive with the best published hand-tuned, application-specific GPU
solvers, and orders of magnitude beyond a general-purpose auto-generated
solver
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron
General purpose graphical processing units (GPU's) offer high processing
speeds for certain classes of highly parallelizable computations, such as
matrix operations and Fourier transforms, that lie at the heart of
first-principles electronic structure calculations. Inclusion of exact-exchange
increases the cost of density functional theory by orders of magnitude,
motivating the use of GPU's. Porting the widely used electronic density
functional code VASP to run on a GPU results in a 5-20 fold performance boost
of exact-exchange compared with a traditional CPU. We analyze performance
bottlenecks and discuss classes of problems that will benefit from the GPU. As
an illustration of the capabilities of this implementation, we calculate the
lattice stability {\alpha}- and {\beta}-rhombohedral boron structures utilizing
exact-exchange. Our results confirm the energetic preference for
symmetry-breaking partial occupation of the {\beta}-rhombohedral structure at
low temperatures, but does not resolve the stability of {\alpha} relative to
{\beta}
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Sparse matrix-vector multiplication on GPGPUs
The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains
- …