963 research outputs found
Scalability Analysis of Parallel GMRES Implementations
Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
High-order adaptive time stepping for vesicle suspensions with viscosity contrast
We construct a high-order adaptive time stepping scheme for vesicle
suspensions with viscosity contrast. The high-order accuracy is achieved using
a spectral deferred correction (SDC) method, and adaptivity is achieved by
estimating the local truncation error with the numerical error of physically
constant values. Numerical examples demonstrate that our method can handle
suspensions with vesicles that are tumbling, tank-treading, or both. Moreover,
we demonstrate that a user-prescribed tolerance can be automatically achieved
for simulations with long time horizons
Adaptive quadrature by expansion for layer potential evaluation in two dimensions
When solving partial differential equations using boundary integral equation
methods, accurate evaluation of singular and nearly singular integrals in layer
potentials is crucial. A recent scheme for this is quadrature by expansion
(QBX), which solves the problem by locally approximating the potential using a
local expansion centered at some distance from the source boundary. In this
paper we introduce an extension of the QBX scheme in 2D denoted AQBX - adaptive
quadrature by expansion - which combines QBX with an algorithm for automated
selection of parameters, based on a target error tolerance. A key component in
this algorithm is the ability to accurately estimate the numerical errors in
the coefficients of the expansion. Combining previous results for flat panels
with a procedure for taking the panel shape into account, we derive such error
estimates for arbitrarily shaped boundaries in 2D that are discretized using
panel-based Gauss-Legendre quadrature. Applying our scheme to numerical
solutions of Dirichlet problems for the Laplace and Helmholtz equations, and
also for solving these equations, we find that the scheme is able to satisfy a
given target tolerance to within an order of magnitude, making it useful for
practical applications. This represents a significant simplification over the
original QBX algorithm, in which choosing a good set of parameters can be hard
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
Comparison of different nonlinear solvers for 2D time-implicit stellar hydrodynamics
Time-implicit schemes are attractive since they allow numerical time steps
that are much larger than those permitted by the Courant-Friedrich-Lewy
criterion characterizing time-explicit methods. This advantage comes, however,
with a cost: the solution of a system of nonlinear equations is required at
each time step. In this work, the nonlinear system results from the
discretization of the hydrodynamical equations with the Crank-Nicholson scheme.
We compare the cost of different methods, based on Newton-Raphson iterations,
to solve this nonlinear system, and benchmark their performances against
time-explicit schemes. Since our general scientific objective is to model
stellar interiors, we use as test cases two realistic models for the convective
envelope of a red giant and a young Sun. Focusing on 2D simulations, we show
that the best performances are obtained with the quasi-Newton method proposed
by Broyden. Another important concern is the accuracy of implicit calculations.
Based on the study of an idealized problem, namely the advection of a single
vortex by a uniform flow, we show that there are two aspects: i) the nonlinear
solver has to be accurate enough to resolve the truncation error of the
numerical discretization, and ii) the time step has be small enough to resolve
the advection of eddies. We show that with these two conditions fulfilled, our
implicit methods exhibit similar accuracy to time-explicit schemes, which have
lower values for the time step and higher computational costs. Finally, we
discuss in the conclusion the applicability of these methods to fully implicit
3D calculations.Comment: Accepted for publication in A&
A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver
In this paper, we present a GPU-accelerated direct-sum boundary integral
method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a
well-posed boundary integral formulation is used to ensure the fast convergence
of Krylov subspace based linear algebraic solver such as the GMRES. The
molecular surfaces are discretized with flat triangles and centroid
collocation. To speed up our method, we take advantage of the parallel nature
of the boundary integral formulation and parallelize the schemes within CUDA
shared memory architecture on GPU. The schemes use only
size-of-double device memory for a biomolecule with triangular surface
elements and partial charges. Numerical tests of these schemes show
well-maintained accuracy and fast convergence. The GPU implementation using one
GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation
using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on
well-discretized molecular surfaces with up to 300,000 boundary elements will
take less than about 10 minutes, hence our approach is particularly suitable
for fast electrostatics computations on small to medium biomolecules
- …