2,451 research outputs found
A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver
In this paper, we present a GPU-accelerated direct-sum boundary integral
method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a
well-posed boundary integral formulation is used to ensure the fast convergence
of Krylov subspace based linear algebraic solver such as the GMRES. The
molecular surfaces are discretized with flat triangles and centroid
collocation. To speed up our method, we take advantage of the parallel nature
of the boundary integral formulation and parallelize the schemes within CUDA
shared memory architecture on GPU. The schemes use only
size-of-double device memory for a biomolecule with triangular surface
elements and partial charges. Numerical tests of these schemes show
well-maintained accuracy and fast convergence. The GPU implementation using one
GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation
using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on
well-discretized molecular surfaces with up to 300,000 boundary elements will
take less than about 10 minutes, hence our approach is particularly suitable
for fast electrostatics computations on small to medium biomolecules
High performance interior point methods for three-dimensional finite element limit analysis
The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes finite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional finite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional finite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, differences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and fixed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modified version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×
Greedy low-rank algorithm for spatial connectome regression
Recovering brain connectivity from tract tracing data is an important
computational problem in the neurosciences. Mesoscopic connectome
reconstruction was previously formulated as a structured matrix regression
problem (Harris et al., 2016), but existing techniques do not scale to the
whole-brain setting. The corresponding matrix equation is challenging to solve
due to large scale, ill-conditioning, and a general form that lacks a
convergent splitting. We propose a greedy low-rank algorithm for connectome
reconstruction problem in very high dimensions. The algorithm approximates the
solution by a sequence of rank-one updates which exploit the sparse and
positive definite problem structure. This algorithm was described previously
(Kressner and Sirkovi\'c, 2015) but never implemented for this connectome
problem, leading to a number of challenges. We have had to design judicious
stopping criteria and employ efficient solvers for the three main sub-problems
of the algorithm, including an efficient GPU implementation that alleviates the
main bottleneck for large datasets. The performance of the method is evaluated
on three examples: an artificial "toy" dataset and two whole-cortex instances
using data from the Allen Mouse Brain Connectivity Atlas. We find that the
method is significantly faster than previous methods and that moderate ranks
offer good approximation. This speedup allows for the estimation of
increasingly large-scale connectomes across taxa as these data become available
from tracing experiments. The data and code are available online
Fixing Nonconvergence of Algebraic Iterative Reconstruction with an Unmatched Backprojector
We consider algebraic iterative reconstruction methods with applications in
image reconstruction. In particular, we are concerned with methods based on an
unmatched projector/backprojector pair; i.e., the backprojector is not the
exact adjoint or transpose of the forward projector. Such situations are common
in large-scale computed tomography, and we consider the common situation where
the method does not converge due to the nonsymmetry of the iteration matrix. We
propose a modified algorithm that incorporates a small shift parameter, and we
give the conditions that guarantee convergence of this method to a fixed point
of a slightly perturbed problem. We also give perturbation bounds for this
fixed point. Moreover, we discuss how to use Krylov subspace methods to
efficiently estimate the leftmost eigenvalue of a certain matrix to select a
proper shift parameter. The modified algorithm is illustrated with test
problems from computed tomography
Tensor Computation: A New Framework for High-Dimensional Problems in EDA
Many critical EDA problems suffer from the curse of dimensionality, i.e. the
very fast-scaling computational burden produced by large number of parameters
and/or unknown variables. This phenomenon may be caused by multiple spatial or
temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit
simulation), nonlinearity of devices and circuits, large number of design or
optimization parameters (e.g. full-chip routing/placement and circuit sizing),
or extensive process variations (e.g. variability/reliability analysis and
design for manufacturability). The computational challenges generated by such
high dimensional problems are generally hard to handle efficiently with
traditional EDA core algorithms that are based on matrix and vector
computation. This paper presents "tensor computation" as an alternative general
framework for the development of efficient EDA algorithms and tools. A tensor
is a high-dimensional generalization of a matrix and a vector, and is a natural
choice for both storing and solving efficiently high-dimensional EDA problems.
This paper gives a basic tutorial on tensors, demonstrates some recent examples
of EDA applications (e.g., nonlinear circuit modeling and high-dimensional
uncertainty quantification), and suggests further open EDA problems where the
use of tensor computation could be of advantage.Comment: 14 figures. Accepted by IEEE Trans. CAD of Integrated Circuits and
System
Recommended from our members
Schnelle Löser für partielle Differentialgleichungen
[no abstract available
High-performance image reconstruction in fluorescence tomography on desktop computers and graphics hardware
Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images
- …