1,643 research outputs found

    Domain Decomposition Based High Performance Parallel Computing\ud

    Get PDF
    The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested

    Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

    Full text link
    We discuss an approach for solving sparse or dense banded linear systems Ax=b{\bf A} {\bf x} = {\bf b} on a Graphics Processing Unit (GPU) card. The matrix AāˆˆRNƗN{\bf A} \in {\mathbb{R}}^{N \times N} is possibly nonsymmetric and moderately large; i.e., 10000ā‰¤Nā‰¤50000010000 \leq N \leq 500000. The ${\it split\ and\ parallelize}( ({\tt SaP})approachseekstopartitionthematrix) approach seeks to partition the matrix {\bf A}intodiagonalsubāˆ’blocks into diagonal sub-blocks {\bf A}_i,, i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsubāˆ’blocks, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks {\bf A}_i.Thisapproach,alongwiththeKrylovsubspaceāˆ’basediterativemethodthatitpreconditions,areimplementedinasolvercalled. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called {\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:, which is compared in terms of efficiency with three commonly used sparse direct solvers: {\tt PARDISO},, {\tt SuperLU},and, and {\tt MUMPS}.. {\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrowāˆ’columnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntelā€²s, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's {\tt MKL},, {\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant. also fares well when used to solve dense banded systems that are close to being diagonally dominant. {\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page

    A domain decomposing parallel sparse linear system solver

    Get PDF
    The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few examples of the application areas in which large sparse linear systems need to be solved effectively. In this paper we introduce a new parallel hybrid sparse linear system solver for distributed memory architectures that contains both direct and iterative components. We show that by using our solver one can alleviate the drawbacks of direct and iterative solvers, achieving better scalability than with direct solvers and more robustness than with classical preconditioned iterative solvers. Comparisons to well-known direct and iterative solvers on a parallel architecture are provided.Comment: To appear in Journal of Computational and Applied Mathematic

    Selected inversion as key to a stable Langevin evolution across the QCD phase boundary

    Full text link
    We present new results of full QCD at nonzero chemical potential. In PRD 92, 094516 (2015) the complex Langevin method was shown to break down when the inverse coupling decreases and enters the transition region from the deconfined to the confined phase. We found that the stochastic technique used to estimate the drift term can be very unstable for indefinite matrices. This may be avoided by using the full inverse of the Dirac operator, which is, however, too costly for four-dimensional lattices. The major breakthrough in this work was achieved by realizing that the inverse elements necessary for the drift term can be computed efficiently using the selected inversion technique provided by the parallel sparse direct solver package PARDISO. In our new study we show that no breakdown of the complex Langevin method is encountered and that simulations can be performed across the phase boundary.Comment: 8 pages, 6 figures, Proceedings of the 35th International Symposium on Lattice Field Theory, Granada, Spai

    FERM3D: A finite element R-matrix electron molecule scattering code

    Get PDF
    FERM3D is a three-dimensional finite element program, for the elastic scattering of a low energy electron from a general polyatomic molecule, which is converted to a potential scattering problem. The code is based on tricubic polynomials in spherical coordinates. The electron-molecule interaction is treated as a sum of three terms: electrostatic, exchange. and polarisation. The electrostatic term can be extracted directly from ab initio codes ({\sc{GAUSSIAN 98}} in the work described here), while the exchange term is approximated using a local density functional. A local polarisation potential based on density functional theory [C. Lee, W. Yang and R. G. Parr, {Phys. Rev. B} {37}, (1988) 785] describes the long range attraction to the molecular target induced by the scattering electron. Photoionisation calculations are also possible and illustrated in the present work. The generality and simplicity of the approach is important in extending electron-scattering calculations to more complex targets than it is possible with other methods.Comment: 30 pages, 4 figures, preprint, Computer Physics Communications (in press

    A two step viscothermal acoustic FE method

    Get PDF
    Previously, the authors presented a finite element for viscothermal acoustics. This element has the velocity vector, the temperature and the pressure as degrees of freedom. It can be used, for example, to model sound propagation in miniature acoustical transducers. Unfortunately, the large number of coupled degrees of freedom can make the models big and time consuming to solve. A method with reduced calculation time has been developed. It is possible to partially decouple the temperature degree of freedom, as result of the differences in the characteristic length scales of acoustics and heat conduction. This leads to a method that uses two sequential steps. In the first step, a scalar field containing information about the thermal effects is calculated (not the temperature). This is a relatively small FE calculation. In the second step, the actual viscothermal acoustical equations are solved. This calculation uses the field calculated in the first step and has the velocity vector and the pressure as the degrees of freedom. The temperature is not a degree of freedom anymore, but it can be easily calculated in a post processing step. The required computational effort is reduced significantly, while the difference in the results, compared to the fully coupled method, is negligible. Along with the theoretical basis for the method, a specific FE calculation is presented to illustrate its accuracy and improvement in calculation time

    On large-scale diagonalization techniques for the Anderson model of localization

    Get PDF
    We propose efficient preconditioning algorithms for an eigenvalue problem arising in quantum physics, namely the computation of a few interior eigenvalues and their associated eigenvectors for large-scale sparse real and symmetric indefinite matrices of the Anderson model of localization. We compare the Lanczos algorithm in the 1987 implementation by Cullum and Willoughby with the shift-and-invert techniques in the implicitly restarted Lanczos method and in the Jacobiā€“Davidson method. Our preconditioning approaches for the shift-and-invert symmetric indefinite linear system are based on maximum weighted matchings and algebraic multilevel incomplete LDLT factorizations. These techniques can be seen as a complement to the alternative idea of using more complete pivoting techniques for the highly ill-conditioned symmetric indefinite Anderson matrices. We demonstrate the effectiveness and the numerical accuracy of these algorithms. Our numerical examples reveal that recent algebraic multilevel preconditioning solvers can accelerate the computation of a large-scale eigenvalue problem corresponding to the Anderson model of localization by several orders of magnitude
    • ā€¦
    corecore