Search CORE

963 research outputs found

Scalability Analysis of Parallel GMRES Implementations

Author: Allison D.
Sosonkina M.
Watson L.T.
Publication venue
Publication date: 01/01/2001
Field of study

Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics

Computer Science Technical Reports @Virginia Tech

High-order adaptive time stepping for vesicle suspensions with viscosity contrast

Author: Biros George
Quaife Bryan
Publication venue
Publication date: 31/08/2014
Field of study

We construct a high-order adaptive time stepping scheme for vesicle suspensions with viscosity contrast. The high-order accuracy is achieved using a spectral deferred correction (SDC) method, and adaptivity is achieved by estimating the local truncation error with the numerical error of physically constant values. Numerical examples demonstrate that our method can handle suspensions with vesicles that are tumbling, tank-treading, or both. Moreover, we demonstrate that a user-prescribed tolerance can be automatically achieved for simulations with long time horizons

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Adaptive quadrature by expansion for layer potential evaluation in two dimensions

Author: Klinteberg Ludvig af
Tornberg Anna-Karin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

When solving partial differential equations using boundary integral equation methods, accurate evaluation of singular and nearly singular integrals in layer potentials is crucial. A recent scheme for this is quadrature by expansion (QBX), which solves the problem by locally approximating the potential using a local expansion centered at some distance from the source boundary. In this paper we introduce an extension of the QBX scheme in 2D denoted AQBX - adaptive quadrature by expansion - which combines QBX with an algorithm for automated selection of parameters, based on a target error tolerance. A key component in this algorithm is the ability to accurately estimate the numerical errors in the coefficients of the expansion. Combining previous results for flat panels with a procedure for taking the panel shape into account, we derive such error estimates for arbitrarily shaped boundaries in 2D that are discretized using panel-based Gauss-Legendre quadrature. Applying our scheme to numerical solutions of Dirichlet problems for the Laplace and Helmholtz equations, and also for solving these equations, we find that the scheme is able to satisfy a given target tolerance to within an order of magnitude, making it useful for practical applications. This represents a significant simplification over the original QBX algorithm, in which choosing a good set of parameters can be hard

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

eScholarship - University of California

DI-fusion

Comparison of different nonlinear solvers for 2D time-implicit stellar hydrodynamics

Author: Baraffe Isabelle
Viallet Maxime
Walder Rolf
Publication venue: 'EDP Sciences'
Publication date: 28/05/2013
Field of study

Time-implicit schemes are attractive since they allow numerical time steps that are much larger than those permitted by the Courant-Friedrich-Lewy criterion characterizing time-explicit methods. This advantage comes, however, with a cost: the solution of a system of nonlinear equations is required at each time step. In this work, the nonlinear system results from the discretization of the hydrodynamical equations with the Crank-Nicholson scheme. We compare the cost of different methods, based on Newton-Raphson iterations, to solve this nonlinear system, and benchmark their performances against time-explicit schemes. Since our general scientific objective is to model stellar interiors, we use as test cases two realistic models for the convective envelope of a red giant and a young Sun. Focusing on 2D simulations, we show that the best performances are obtained with the quasi-Newton method proposed by Broyden. Another important concern is the accuracy of implicit calculations. Based on the study of an idealized problem, namely the advection of a single vortex by a uniform flow, we show that there are two aspects: i) the nonlinear solver has to be accurate enough to resolve the truncation error of the numerical discretization, and ii) the time step has be small enough to resolve the advection of eddies. We show that with these two conditions fulfilled, our implicit methods exhibit similar accuracy to time-explicit schemes, which have lower values for the time step and higher computational costs. Finally, we discuss in the conclusion the applicability of these methods to fully implicit 3D calculations.Comment: Accepted for publication in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

MPG.PuRe

A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver

Author: Geng Weihua
Jacob Ferosh
Publication venue: 'Elsevier BV'
Publication date: 24/01/2013
Field of study

In this paper, we present a GPU-accelerated direct-sum boundary integral method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a well-posed boundary integral formulation is used to ensure the fast convergence of Krylov subspace based linear algebraic solver such as the GMRES. The molecular surfaces are discretized with flat triangles and centroid collocation. To speed up our method, we take advantage of the parallel nature of the boundary integral formulation and parallelize the schemes within CUDA shared memory architecture on GPU. The schemes use only

11N+6N_c

size-of-double device memory for a biomolecule with

N

triangular surface elements and

N_c

partial charges. Numerical tests of these schemes show well-maintained accuracy and fast convergence. The GPU implementation using one GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on well-discretized molecular surfaces with up to 300,000 boundary elements will take less than about 10 minutes, hence our approach is particularly suitable for fast electrostatics computations on small to medium biomolecules

arXiv.org e-Print Archive

CiteSeerX