814 research outputs found

    Effect of Mixed Precision Computing on H-Matrix Vector Multiplication in BEM Analysis

    Full text link
    Hierarchical Matrix (H-matrix) is an approximation technique which splits a target dense matrix into multiple submatrices, and where a selected portion of submatrices are low-rank approximated. The technique substantially reduces both time and space complexity of dense matrix vector multiplication, and hence has been applied to numerous practical problems. In this paper, we aim to accelerate the H-matrix vector multiplication by introducing mixed precision computing, where we employ both binary64 (FP64) and binary32 (FP32) arithmetic operations. We propose three methods to introduce mixed precision computing to H-matrix vector multiplication, and then evaluate them in a boundary element method (BEM) analysis. The numerical tests examine the effects of mixed precision computing, particularly on the required simulation time and rate of convergence of the iterative (BiCG-STAB) linear solver. We confirm the effectiveness of the proposed methods.Comment: Accepted manuscript to International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia2020), January 15--17, 2020, Fukuoka, Japa

    Fast Isogeometric Boundary Element Method based on Independent Field Approximation

    Full text link
    An isogeometric boundary element method for problems in elasticity is presented, which is based on an independent approximation for the geometry, traction and displacement field. This enables a flexible choice of refinement strategies, permits an efficient evaluation of geometry related information, a mixed collocation scheme which deals with discontinuous tractions along non-smooth boundaries and a significant reduction of the right hand side of the system of equations for common boundary conditions. All these benefits are achieved without any loss of accuracy compared to conventional isogeometric formulations. The system matrices are approximated by means of hierarchical matrices to reduce the computational complexity for large scale analysis. For the required geometrical bisection of the domain, a strategy for the evaluation of bounding boxes containing the supports of NURBS basis functions is presented. The versatility and accuracy of the proposed methodology is demonstrated by convergence studies showing optimal rates and real world examples in two and three dimensions.Comment: 32 pages, 27 figure

    A fast and well-conditioned spectral method for singular integral equations

    Get PDF
    We develop a spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems. This is accomplished by utilizing low rank approximations for sparse representations of the bivariate kernels. The resulting system can be solved in O(m2n){\cal O}(m^2n) operations using an adaptive QR factorization, where mm is the bandwidth and nn is the optimal number of unknowns needed to resolve the true solution. The complexity is reduced to O(mn){\cal O}(m n) operations by pre-caching the QR factorization when the same operator is used for multiple right-hand sides. Stability is proved by showing that the resulting linear operator can be diagonally preconditioned to be a compact perturbation of the identity. Applications considered include the Faraday cage, and acoustic scattering for the Helmholtz and gravity Helmholtz equations, including spectrally accurate numerical evaluation of the far- and near-field solution. The Julia software package SingularIntegralEquations.jl implements our method with a convenient, user-friendly interface

    The fast multipole method at exascale

    Get PDF
    This thesis presents a top to bottom analysis on designing and implementing fast algorithms for current and future systems. We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N- body problems. We target the FMM because it is broadly applicable to a variety of scientific particle simulations used to study electromagnetic, fluid, and gravitational phenomena, among others. Importantly, the FMM has asymptotically optimal time complexity with guaranteed approximation accuracy. As such, it is among the most attractive solutions for scalable particle simulation on future extreme scale systems. We specifically address two key challenges. The first challenge is how to engineer fast code for today’s platforms. We present the first in-depth study of multicore op- timizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We introduce novel opti- mizations that significantly improve the within-node scalability of the FMM, thereby enabling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexity analysis of the FMM that considers both intra- and inter- node communication costs. Using these models, we present results for choosing the optimal algorithmic tuning parameter. This analysis also yields the surprising prediction that although the FMM is largely compute-bound today, and therefore highly scalable on current systems, the trajectory of processor architecture designs, if there are no significant changes could cause it to become communication-bound as early as the year 2015. This prediction suggests the utility of our analysis approach, which directly relates algorithmic and architectural characteristics, for enabling a new kind of highlevel algorithm-architecture co-design. To demonstrate the scientific significance of FMM, we present two applications namely, direct simulation of blood which is a multi-scale multi-physics problem and large-scale biomolecular electrostatics. MoBo (Moving Boundaries) is the infrastruc- ture for the direct numerical simulation of blood. It comprises of two key algorithmic components of which FMM is one. We were able to simulate blood flow using Stoke- sian dynamics on 200,000 cores of Jaguar, a peta-flop system and achieve a sustained performance of 0.7 Petaflop/s. The second application we propose as future work in this thesis is biomolecular electrostatics where we solve for the electrical potential using the boundary-integral formulation discretized with boundary element methods (BEM). The computational kernel in solving the large linear system is dense matrix vector multiply which we propose can be calculated using our scalable FMM. We propose to begin with the two dielectric problem where the electrostatic field is cal- culated using two continuum dielectric medium, the solvent and the molecule. This is only a first step to solving biologically challenging problems which have more than two dielectric medium, ion-exclusion layers, and solvent filled cavities. Finally, given the difficulty in producing high-performance scalable code, productivity is a key concern. Recently, numerical algorithms are being redesigned to take advantage of the architectural features of emerging multicore processors. These new classes of algorithms express fine-grained asynchronous parallelism and hence reduce the cost of synchronization. We performed the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art mul- ticore systems. Our implementations in CnC was able to match and in some cases even exceed competing vendor-tuned and domain specific library codes. We combine these two distinct research efforts by expressing FMM in CnC, our approach tries to marry performance with productivity that will be critical on future systems. Looking forward, we would like to extend this to distributed memory machines, specifically implement FMM in the new distributed CnC, distCnC to express fine-grained paral- lelism which would require significant effort in alternative models.Ph.D

    Towards exascale BEM simulations: hybrid parallelisation strategies for boundary element methods

    Get PDF
    Many fields of engineering benefit from an accurate and reliable solver for the Laplace equation. Such an equation is able to model many different phenomena, and is at the base of several multi-physics solvers. For example, in nautical engineering, since the Navier{Stokes system has an extremely high computational cost, many reduced order models are often used to predict ship performance. Under the assumption of incompressible fluid and irrotational flow it is possible to recover a flow field by simply imposing mass conservation, which simplifies to a Laplace equation. Morevore, the deep theoretical background that surrounds this equation, makes it ideal as a benchmark to test new numerical softwares. Over the last decades such equation has often been solved through its Boundary integral formulation, leading to Boundary Element Methods. What makes such methods appealing with respect to a classical Finite Element Method is the fact that they only require discretisation of the boundary. The purpose of the present work is to develop an effcient and optimize BEM for the Laplace equation, designed around the architecture of modern CPUs

    Neural Connectivity with Hidden Gaussian Graphical State-Model

    Full text link
    The noninvasive procedures for neural connectivity are under questioning. Theoretical models sustain that the electromagnetic field registered at external sensors is elicited by currents at neural space. Nevertheless, what we observe at the sensor space is a superposition of projected fields, from the whole gray-matter. This is the reason for a major pitfall of noninvasive Electrophysiology methods: distorted reconstruction of neural activity and its connectivity or leakage. It has been proven that current methods produce incorrect connectomes. Somewhat related to the incorrect connectivity modelling, they disregard either Systems Theory and Bayesian Information Theory. We introduce a new formalism that attains for it, Hidden Gaussian Graphical State-Model (HIGGS). A neural Gaussian Graphical Model (GGM) hidden by the observation equation of Magneto-encephalographic (MEEG) signals. HIGGS is equivalent to a frequency domain Linear State Space Model (LSSM) but with sparse connectivity prior. The mathematical contribution here is the theory for high-dimensional and frequency-domain HIGGS solvers. We demonstrate that HIGGS can attenuate the leakage effect in the most critical case: the distortion EEG signal due to head volume conduction heterogeneities. Its application in EEG is illustrated with retrieved connectivity patterns from human Steady State Visual Evoked Potentials (SSVEP). We provide for the first time confirmatory evidence for noninvasive procedures of neural connectivity: concurrent EEG and Electrocorticography (ECoG) recordings on monkey. Open source packages are freely available online, to reproduce the results presented in this paper and to analyze external MEEG databases

    Real-time stress analysis of three-dimensional boundary element problems with continuously updating geometry

    Get PDF
    Computational design of mechanical components is an iterative process that involves multiple stress analysis runs; this can be time consuming and expensive. Significant improvements in the efficiency of this process can be made by increasing the level of interactivity. One approach is through real-time re-analysis of models with continuously updating geometry. In this work the boundary element method is used to realise this vision. Three primary areas need to be considered to accelerate the re-solution of boundary element problems. These are re-meshing the model, updating the boundary element system of equations and re-solution of the system. Once the initial model has been constructed and solved, the user may apply geometric perturbations to parts of the model. A new re-meshing algorithm accommodates these changes in geometry whilst retaining as much of the existing mesh as possible. This allows the majority of the previous boundary element system of equations to be re-used for the new analysis. Efficiency is achieved during re-integration by applying a reusable intrinsic sample point (RISP) integration scheme with a 64-bit single precision code. Parts of the boundary element system that have not been updated are retained by the re-analysis and integrals that multiply zero boundary conditions are suppressed. For models with fewer than 10,000 degrees of freedom, the re-integration algorithm performs up to five times faster than a standard integration scheme with less than 0.15% reduction in the L_2-norm accuracy of the solution vector. The method parallelises easily and an additional six times speed-up can be achieved on eight processors over the serial implementation. The performance of a range of direct, iterative and reduction based linear solvers have been compared for solving the boundary element system with the iterative generalised minimal residual (GMRES) solver providing the fastest convergence rate and the most accurate result. Further time savings are made by preconditioning the updated system with the LU decomposition of the original system. Using these techniques, near real-time analysis can be achieved for three-dimensional simulations; for two-dimensional models such real-time performance has already been demonstrated
    • …
    corecore