48,679 research outputs found
Distributed-memory large deformation diffeomorphic 3D image registration
We present a parallel distributed-memory algorithm for large deformation
diffeomorphic registration of volumetric images that produces large isochoric
deformations (locally volume preserving). Image registration is a key
technology in medical image analysis. Our algorithm uses a partial differential
equation constrained optimal control formulation. Finding the optimal
deformation map requires the solution of a highly nonlinear problem that
involves pseudo-differential operators, biharmonic operators, and pure
advection operators both forward and back- ward in time. A key issue is the
time to solution, which poses the demand for efficient optimization methods as
well as an effective utilization of high performance computing resources. To
address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov
solver. Our algorithm integrates several components: a spectral discretization
in space, a semi-Lagrangian formulation in time, analytic adjoints, different
regularization functionals (including volume-preserving ones), a spectral
preconditioner, a highly optimized distributed Fast Fourier Transform, and a
cubic interpolation scheme for the semi-Lagrangian time-stepping. We
demonstrate the scalability of our algorithm on images with resolution of up to
on the "Maverick" and "Stampede" systems at the Texas Advanced
Computing Center (TACC). The critical problem in the medical imaging
application domain is strong scaling, that is, solving registration problems of
a moderate size of ---a typical resolution for medical images. We are
able to solve the registration problem for images of this size in less than
five seconds on 64 x86 nodes of TACC's "Maverick" system.Comment: accepted for publication at SC16 in Salt Lake City, Utah, USA;
November 201
A domain decomposing parallel sparse linear system solver
The solution of large sparse linear systems is often the most time-consuming
part of many science and engineering applications. Computational fluid
dynamics, circuit simulation, power network analysis, and material science are
just a few examples of the application areas in which large sparse linear
systems need to be solved effectively. In this paper we introduce a new
parallel hybrid sparse linear system solver for distributed memory
architectures that contains both direct and iterative components. We show that
by using our solver one can alleviate the drawbacks of direct and iterative
solvers, achieving better scalability than with direct solvers and more
robustness than with classical preconditioned iterative solvers. Comparisons to
well-known direct and iterative solvers on a parallel architecture are
provided.Comment: To appear in Journal of Computational and Applied Mathematic
Recycling BiCGSTAB with an Application to Parametric Model Order Reduction
Krylov subspace recycling is a process for accelerating the convergence of
sequences of linear systems. Based on this technique, the recycling BiCG
algorithm has been developed recently. Here, we now generalize and extend this
recycling theory to BiCGSTAB. Recycling BiCG focuses on efficiently solving
sequences of dual linear systems, while the focus here is on efficiently
solving sequences of single linear systems (assuming non-symmetric matrices for
both recycling BiCG and recycling BiCGSTAB).
As compared with other methods for solving sequences of single linear systems
with non-symmetric matrices (e.g., recycling variants of GMRES), BiCG based
recycling algorithms, like recycling BiCGSTAB, have the advantage that they
involve a short-term recurrence, and hence, do not suffer from storage issues
and are also cheaper with respect to the orthogonalizations.
We modify the BiCGSTAB algorithm to use a recycle space, which is built from
left and right approximate invariant subspaces. Using our algorithm for a
parametric model order reduction example gives good results. We show about 40%
savings in the number of matrix-vector products and about 35% savings in
runtime.Comment: 18 pages, 5 figures, Extended version of Max Planck Institute report
(MPIMD/13-21
- …