32,230 research outputs found

    A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

    Get PDF
    In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

    An accurate, fast, mathematically robust, universal, non-iterative algorithm for computing multi-component diffusion velocities

    Full text link
    Using accurate multi-component diffusion treatment in numerical combustion studies remains formidable due to the computational cost associated with solving for diffusion velocities. To obtain the diffusion velocities, for low density gases, one needs to solve the Stefan-Maxwell equations along with the zero diffusion flux criteria, which scales as O(N3)\mathcal{O}(N^3), when solved exactly. In this article, we propose an accurate, fast, direct and robust algorithm to compute multi-component diffusion velocities. To our knowledge, this is the first provably accurate algorithm (the solution can be obtained up to an arbitrary degree of precision) scaling at a computational complexity of O(N)\mathcal{O}(N) in finite precision. The key idea involves leveraging the fact that the matrix of the reciprocal of the binary diffusivities, VV, is low rank, with its rank being independent of the number of species involved. The low rank representation of matrix VV is computed in a fast manner at a computational complexity of O(N)\mathcal{O}(N) and the Sherman-Morrison-Woodbury formula is used to solve for the diffusion velocities at a computational complexity of O(N)\mathcal{O}(N). Rigorous proofs and numerical benchmarks illustrate the low rank property of the matrix VV and scaling of the algorithm.Comment: 16 pages, 7 figures, 1 table, 1 algorith
    corecore