963 research outputs found

    Scalability Analysis of Parallel GMRES Implementations

    Get PDF
    Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics

    High-order adaptive time stepping for vesicle suspensions with viscosity contrast

    Get PDF
    We construct a high-order adaptive time stepping scheme for vesicle suspensions with viscosity contrast. The high-order accuracy is achieved using a spectral deferred correction (SDC) method, and adaptivity is achieved by estimating the local truncation error with the numerical error of physically constant values. Numerical examples demonstrate that our method can handle suspensions with vesicles that are tumbling, tank-treading, or both. Moreover, we demonstrate that a user-prescribed tolerance can be automatically achieved for simulations with long time horizons

    Adaptive quadrature by expansion for layer potential evaluation in two dimensions

    Full text link
    When solving partial differential equations using boundary integral equation methods, accurate evaluation of singular and nearly singular integrals in layer potentials is crucial. A recent scheme for this is quadrature by expansion (QBX), which solves the problem by locally approximating the potential using a local expansion centered at some distance from the source boundary. In this paper we introduce an extension of the QBX scheme in 2D denoted AQBX - adaptive quadrature by expansion - which combines QBX with an algorithm for automated selection of parameters, based on a target error tolerance. A key component in this algorithm is the ability to accurately estimate the numerical errors in the coefficients of the expansion. Combining previous results for flat panels with a procedure for taking the panel shape into account, we derive such error estimates for arbitrarily shaped boundaries in 2D that are discretized using panel-based Gauss-Legendre quadrature. Applying our scheme to numerical solutions of Dirichlet problems for the Laplace and Helmholtz equations, and also for solving these equations, we find that the scheme is able to satisfy a given target tolerance to within an order of magnitude, making it useful for practical applications. This represents a significant simplification over the original QBX algorithm, in which choosing a good set of parameters can be hard

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    Comparison of different nonlinear solvers for 2D time-implicit stellar hydrodynamics

    Full text link
    Time-implicit schemes are attractive since they allow numerical time steps that are much larger than those permitted by the Courant-Friedrich-Lewy criterion characterizing time-explicit methods. This advantage comes, however, with a cost: the solution of a system of nonlinear equations is required at each time step. In this work, the nonlinear system results from the discretization of the hydrodynamical equations with the Crank-Nicholson scheme. We compare the cost of different methods, based on Newton-Raphson iterations, to solve this nonlinear system, and benchmark their performances against time-explicit schemes. Since our general scientific objective is to model stellar interiors, we use as test cases two realistic models for the convective envelope of a red giant and a young Sun. Focusing on 2D simulations, we show that the best performances are obtained with the quasi-Newton method proposed by Broyden. Another important concern is the accuracy of implicit calculations. Based on the study of an idealized problem, namely the advection of a single vortex by a uniform flow, we show that there are two aspects: i) the nonlinear solver has to be accurate enough to resolve the truncation error of the numerical discretization, and ii) the time step has be small enough to resolve the advection of eddies. We show that with these two conditions fulfilled, our implicit methods exhibit similar accuracy to time-explicit schemes, which have lower values for the time step and higher computational costs. Finally, we discuss in the conclusion the applicability of these methods to fully implicit 3D calculations.Comment: Accepted for publication in A&

    A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver

    Full text link
    In this paper, we present a GPU-accelerated direct-sum boundary integral method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a well-posed boundary integral formulation is used to ensure the fast convergence of Krylov subspace based linear algebraic solver such as the GMRES. The molecular surfaces are discretized with flat triangles and centroid collocation. To speed up our method, we take advantage of the parallel nature of the boundary integral formulation and parallelize the schemes within CUDA shared memory architecture on GPU. The schemes use only 11N+6Nc11N+6N_c size-of-double device memory for a biomolecule with NN triangular surface elements and NcN_c partial charges. Numerical tests of these schemes show well-maintained accuracy and fast convergence. The GPU implementation using one GPU card (Nvidia Tesla M2070) achieves 120-150X speed-up to the implementation using one CPU (Intel L5640 2.27GHz). With our approach, solving PB equations on well-discretized molecular surfaces with up to 300,000 boundary elements will take less than about 10 minutes, hence our approach is particularly suitable for fast electrostatics computations on small to medium biomolecules
    • …
    corecore