26 research outputs found

    Algorithmic patterns for H\mathcal{H}-matrices on many-core processors

    Get PDF
    In this work, we consider the reformulation of hierarchical (H\mathcal{H}) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H\mathcal{H} matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H\mathcal{H} matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing H\mathcal{H} matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full H\mathcal{H} matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source H\mathcal{H} matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard H\mathcal{H} matrix library, highlighting profound speedups of our many-core parallel approach

    A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

    Get PDF
    In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

    Kernel-based stochastic collocation for the random two-phase Navier-Stokes equations

    Full text link
    In this work, we apply stochastic collocation methods with radial kernel basis functions for an uncertainty quantification of the random incompressible two-phase Navier-Stokes equations. Our approach is non-intrusive and we use the existing fluid dynamics solver NaSt3DGPF to solve the incompressible two-phase Navier-Stokes equation for each given realization. We are able to empirically show that the resulting kernel-based stochastic collocation is highly competitive in this setting and even outperforms some other standard methods

    On the algebraic construction of sparse multilevel approximations of elliptic tensor product problems

    Get PDF
    We consider the solution of elliptic problems on the tensor product of two physical domains as for example present in the approximation of the solution covariance of elliptic partial differential equations with random input. Previous sparse approximation approaches used a geometrically constructed multilevel hierarchy. Instead, we construct this hierarchy for a given discretized problem by means of the algebraic multigrid method. Thereby, we are able to apply the sparse grid combination technique to problems given on complex geometries and for discretizations arising from unstructured grids, which was not feasible before. Numerical results show that our algebraic construction exhibits the same convergence behaviour as the geometric construction, while being applicable even in black-box type PDE solvers

    Ensemble Kalman filters for reliability estimation in perfusion inference

    Get PDF
    We consider the solution of inverse problems in dynamic contrast–enhanced imaging by means of Ensemble Kalman filters. Our quantity of interest is blood perfusion, i.e. blood flow rates in tissue. While existing approaches to compute blood perfusion parameters for given time series of radiological measurements mainly rely on deterministic, deconvolution–based methods, we aim at recovering probabilistic solution information for given noisy measurements. To this end, we model radiological image capturing as sequential data assimilation process and solve it by an Ensemble Kalman filter. Thereby, we recover deterministic results as ensemble–based mean and are able to compute reliability information such as probabilities for the perfusion to be in a given range. Our target application is the inference of blood perfusion parameters in the human brain. A numerical study shows promising results for artificial measurements generated by a Digital Perfusion Phantom

    Analysis and parallelizationstrategies for Ruge-Stüben AMGon many-core processors

    Get PDF
    The Ruge-Stuben algebraic multigrid method (AMG) is an optimal-complexity black-box approach to solve linear systems arising in discretizations of e.g. elliptic PDEs. Recently, there has been a growing interest in parallelizing this method on many-core hardware, especially graphics processing units (GPUs). This type of hardware delivers high performance for highly parallel algorithms. In this work, we analyse convergence properties of recent AMG developments for many-core processors and propose to use more classical choices of AMG components for higher robustness. Based on these choices, we introduce many-core parallelization strategies for a robust hybrid many-core AMG. The strategies can be understood and applied without deep knowledge of a given many-core architecture. We use them to propose a new hybrid GPU implementation. The implementation is tested in an in-depth performance analysis, which outlines its good convergence properties and high performance in the solve phase

    A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

    Get PDF
    In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors’ knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

    On the algebraic construction of sparse multilevel approximations of elliptic tensor product problems

    Get PDF
    We consider the solution of elliptic problems on the tensor product of two physical domains as e.g. present in the approximation of the solution covariance of elliptic partial differential equations with random input. Previous sparse approximation approaches used a geometrically constructed multilevel hierarchy. Instead, we construct this hierarchy for a given discretized problem by means of the algebraic multigrid method (AMG). Thereby, we are able to apply the sparse grid combination technique to problems given on complex geometries and for discretizations arising from unstructured grids, which was not feasible before. Numerical results show that our algebraic construction exhibits the same convergence behaviour as the geometric construction, while being applicable even in black-box type PDE solvers

    Boosting quantum machine learning models with multi-level combination technique: Pople diagrams revisited

    Get PDF
    Inspired by Pople diagrams popular in quantum chemistry, we introduce a hierarchical scheme, based on the multilevel combination (C) technique, to combine various levels of approximations made when calculating molecular energies within quantum chemistry. When combined with quantum machine learning (QML) models, the resulting CQML model is a generalized unified recursive kernel ridge regression which exploits correlations implicitly encoded in training data comprised of multiple levels in multiple dimensions. Here, we have investigated up to three dimensions: Chemical space, basis set, and electron correlation treatment. Numerical results have been obtained for at- omization energies of a set of ∼7'000 organic molecules with up to 7 atoms (not counting hydrogens) containing CHONFClS, as well as for ∼6'000 constitutional isomers of C 7 H 10 O 2 . CQML learning curves for atomization energies suggest a dramatic reduction in necessary training samples calculated with the most accurate and costly method. In order to generate millisecond estimates of CCSD(T)/cc-pvdz atomization energies with prediction errors reaching chemical accuracy (∼1 kcal/mol), the CQML model requires only ∼100 training instances at CCSD(T)/cc-pvdz level, rather than thousands within conventional QML, while more training molecules are required at lower levels. Our results suggest a possibly favorable trade-off between various hierarchical approximations whose computational cost scales differently with electron number
    corecore