4 research outputs found
A matrix-free approach to parallel and memory-efficient deformable image registration
We present a novel computational approach to fast and memory-efficient
deformable image registration. In the variational registration model, the
computation of the objective function derivatives is the computationally most
expensive operation, both in terms of runtime and memory requirements. In order
to target this bottleneck, we analyze the matrix structure of gradient and
Hessian computations for the case of the normalized gradient fields distance
measure and curvature regularization. Based on this analysis, we derive
equivalent matrix-free closed-form expressions for derivative computations,
eliminating the need for storing intermediate results and the costs of sparse
matrix arithmetic. This has further benefits: (1) matrix computations can be
fully parallelized, (2) memory complexity for derivative computation is reduced
from linear to constant, and (3) overall computation times are substantially
reduced.
In comparison with an optimized matrix-based reference implementation, the
CPU implementation achieves speedup factors between 3.1 and 9.7, and we are
able to handle substantially higher resolutions. Using a GPU implementation, we
achieve an additional speedup factor of up to 9.2.
Furthermore, we evaluated the approach on real-world medical datasets. On ten
publicly available lung CT images from the DIR-Lab 4DCT dataset, we achieve the
best mean landmark error of 0.93 mm compared to other submissions on the
DIR-Lab website, with an average runtime of only 9.23 s. Complete non-rigid
registration of full-size 3D thorax-abdomen CT volumes from oncological
follow-up is achieved in 12.6 s. The experimental results show that the
proposed matrix-free algorithm enables the use of variational registration
models also in applications which were previously impractical due to memory or
runtime restrictions.Comment: Accepted for publication in SIAM Journal on Scientific Computing
(SISC
Robust, fast and accurate: a 3-step method for automatic histological image registration
We present a 3-step registration pipeline for differently stained
histological serial sections that consists of 1) a robust pre-alignment, 2) a
parametric registration computed on coarse resolution images, and 3) an
accurate nonlinear registration. In all three steps the NGF distance measure is
minimized with respect to an increasingly flexible transformation. We apply the
method in the ANHIR image registration challenge and evaluate its performance
on the training data. The presented method is robust (error reduction in 99.6%
of the cases), fast (runtime 4 seconds) and accurate (median relative target
registration error 0.19%)
Fast GPU 3D Diffeomorphic Image Registration
3D image registration is one of the most fundamental and computationally
expensive operations in medical image analysis. Here, we present a
mixed-precision, Gauss--Newton--Krylov solver for diffeomorphic registration of
two images. Our work extends the publicly available CLAIRE library to GPU
architectures. Despite the importance of image registration, only a few
implementations of large deformation diffeomorphic registration packages
support GPUs. Our contributions are new algorithms to significantly reduce the
run time of the two main computational kernels in CLAIRE: calculation of
derivatives and scattered-data interpolation. We deploy (i) highly-optimized,
mixed-precision GPU-kernels for the evaluation of scattered-data interpolation,
(ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with
optimized 8th-order finite differences, and (iii) compare with state-of-the-art
CPU and GPU implementations. As a highlight, we demonstrate that we can
register clinical images in less than 6 seconds on a single NVIDIA
Tesla V100. This amounts to over 20 speed-up over the current version
of CLAIRE and over 30 speed-up over existing GPU implementations.Comment: 20 pages, 6 figures, 8 table
CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration
With this work, we release CLAIRE, a distributed-memory implementation of an
effective solver for constrained large deformation diffeomorphic image
registration problems in three dimensions. We consider an optimal control
formulation. We invert for a stationary velocity field that parameterizes the
deformation map. Our solver is based on a globalized, preconditioned, inexact
reduced space Gauss--Newton--Krylov scheme. We exploit state-of-the-art
techniques in scientific computing to develop an effective solver that scales
to thousands of distributed memory nodes on high-end clusters. We present the
formulation, discuss algorithmic features, describe the software package, and
introduce an improved preconditioner for the reduced space Hessian to speed up
the convergence of our solver. We test registration performance on synthetic
and real data. We demonstrate registration accuracy on several neuroimaging
datasets. We compare the performance of our scheme against different flavors of
the Demons algorithm for diffeomorphic image registration. We study convergence
of our preconditioner and our overall algorithm. We report scalability results
on state-of-the-art supercomputing platforms. We demonstrate that we can solve
registration problems for clinically relevant data sizes in two to four minutes
on a standard compute node with 20 cores, attaining excellent data fidelity.
With the present work we achieve a speedup of (on average) 5 with a
peak performance of up to 17 compared to our former work.Comment: 37 pages