12,040 research outputs found
An inexact Newton-Krylov algorithm for constrained diffeomorphic image registration
We propose numerical algorithms for solving large deformation diffeomorphic
image registration problems. We formulate the nonrigid image registration
problem as a problem of optimal control. This leads to an infinite-dimensional
partial differential equation (PDE) constrained optimization problem.
The PDE constraint consists, in its simplest form, of a hyperbolic transport
equation for the evolution of the image intensity. The control variable is the
velocity field. Tikhonov regularization on the control ensures well-posedness.
We consider standard smoothness regularization based on - or
-seminorms. We augment this regularization scheme with a constraint on the
divergence of the velocity field rendering the deformation incompressible and
thus ensuring that the determinant of the deformation gradient is equal to one,
up to the numerical error.
We use a Fourier pseudospectral discretization in space and a Chebyshev
pseudospectral discretization in time. We use a preconditioned, globalized,
matrix-free, inexact Newton-Krylov method for numerical optimization. A
parameter continuation is designed to estimate an optimal regularization
parameter. Regularity is ensured by controlling the geometric properties of the
deformation field. Overall, we arrive at a black-box solver. We study spectral
properties of the Hessian, grid convergence, numerical accuracy, computational
efficiency, and deformation regularity of our scheme. We compare the designed
Newton-Krylov methods with a globalized preconditioned gradient descent. We
study the influence of a varying number of unknowns in time.
The reported results demonstrate excellent numerical accuracy, guaranteed
local deformation regularity, and computational efficiency with an optional
control on local mass conservation. The Newton-Krylov methods clearly
outperform the Picard method if high accuracy of the inversion is required.Comment: 32 pages; 10 figures; 9 table
Nodal Discontinuous Galerkin Methods on Graphics Processors
Discontinuous Galerkin (DG) methods for the numerical solution of partial
differential equations have enjoyed considerable success because they are both
flexible and robust: They allow arbitrary unstructured geometries and easy
control of accuracy without compromising simulation stability. Lately, another
property of DG has been growing in importance: The majority of a DG operator is
applied in an element-local way, with weak penalty-based element-to-element
coupling.
The resulting locality in memory access is one of the factors that enables DG
to run on off-the-shelf, massively parallel graphics processors (GPUs). In
addition, DG's high-order nature lets it require fewer data points per
represented wavelength and hence fewer memory accesses, in exchange for higher
arithmetic intensity. Both of these factors work significantly in favor of a
GPU implementation of DG.
Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for
Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60
relative to a serial computation on a current-generation CPU. In many cases,
our algorithms exhibit full use of the device's available memory bandwidth.
Example computations achieve and surpass 200 gigaflops/s of net
application-level floating point work.
In this article, we describe and derive the techniques used to reach this
level of performance. In addition, we present comprehensive data on the
accuracy and runtime behavior of the method.Comment: 33 pages, 12 figures, 4 table
Multi-GPU maximum entropy image synthesis for radio astronomy
The maximum entropy method (MEM) is a well known deconvolution technique in
radio-interferometry. This method solves a non-linear optimization problem with
an entropy regularization term. Other heuristics such as CLEAN are faster but
highly user dependent. Nevertheless, MEM has the following advantages: it is
unsupervised, it has a statistical basis, it has a better resolution and better
image quality under certain conditions. This work presents a high performance
GPU version of non-gridding MEM, which is tested using real and simulated data.
We propose a single-GPU and a multi-GPU implementation for single and
multi-spectral data, respectively. We also make use of the Peer-to-Peer and
Unified Virtual Addressing features of newer GPUs which allows to exploit
transparently and efficiently multiple GPUs. Several ALMA data sets are used to
demonstrate the effectiveness in imaging and to evaluate GPU performance. The
results show that a speedup from 1000 to 5000 times faster than a sequential
version can be achieved, depending on data and image size. This allows to
reconstruct the HD142527 CO(6-5) short baseline data set in 2.1 minutes,
instead of 2.5 days that takes a sequential version on CPU.Comment: 11 pages, 13 figure
High Performance Matrix-Fee Method for Large-Scale Finite Element Analysis on Graphics Processing Units
This thesis presents a high performance computing (HPC) algorithm on graphics processing units (GPU) for large-scale numerical simulations. In particular, the research focuses on the development of an efficient matrix-free conjugate gradient solver for the acceleration and scalability of the steady-state heat transfer finite element analysis (FEA) on a three-dimension uniform structured hexahedral mesh using a voxel-based technique. One of the greatest challenges in large-scale FEA is the availability of computer memory for solving the linear system of equations. Like in large-scale heat transfer simulations, where the size of the system matrix assembly becomes very large, the FEA solver requires huge amounts
of computational time and memory that very often exceed the actual memory limits of the available hardware resources. To overcome this problem a matrix-free conjugate gradient
(MFCG) method is designed and implemented to finite element computations which avoids the global matrix assembly. The main difference of the MFCG to the classical conjugate
gradient (CG) solver lies on the implementation of the matrix-vector product operation. Matrix-vector operation found to be the most expensive process consuming more than 80% out of the total computations for the numerical solution and thus a matrix-free matrix-vector (MFMV) approach becomes beneficial for saving memory and computational time throughout the execution of the FEA. In summary, the MFMV algorithm consists of three nested loops: (a) a loop over the mesh elements of the domain, (b) a loop on the element nodal values to perform the element matrix-vector operations and (c) the summation and transformation of the nodal values into their correct positions in the global index. A performance analysis on a serial and a parallel implementation on a GPU shows that the MFCG solver
outperforms the classical CG consuming significantly lower amounts of memory allowing for much larger size simulations. The outcome of this study suggests that the MFCG can also speed-up and scale the execution of large-scale finite element simulations
- …