183 research outputs found

    Lecture 09: Hierarchically Low Rank and Kronecker Methods

    Get PDF
    Exploiting structures of matrices goes beyond identifying their non-zero patterns. In many cases, dense full-rank matrices have low-rank submatrices that can be exploited to construct fast approximate algorithms. In other cases, dense matrices can be decomposed into Kronecker factors that are much smaller than the original matrix. Sparsity is a consequence of the connectivity of the underlying geometry (mesh, graph, interaction list, etc.), whereas the rank-deficiency of submatrices is closely related to the distance within this underlying geometry. For high dimensional geometry encountered in data science applications, the curse of dimensionality poses a challenge for rank-structured approaches. On the other hand, models in data science that are formulated as a composition of functions, lead to a Kronecker product structure that yields a different kind of fast algorithm. In this lecture, we will look at some examples of when rank structure and Kronecker structure can be useful

    Lecture 09: Hierarchically Low Rank and Kronecker Methods

    Get PDF
    Exploiting structures of matrices goes beyond identifying their non-zero patterns. In many cases, dense full-rank matrices have low-rank submatrices that can be exploited to construct fast approximate algorithms. In other cases, dense matrices can be decomposed into Kronecker factors that are much smaller than the original matrix. Sparsity is a consequence of the connectivity of the underlying geometry (mesh, graph, interaction list, etc.), whereas the rank-deficiency of submatrices is closely related to the distance within this underlying geometry. For high dimensional geometry encountered in data science applications, the curse of dimensionality poses a challenge for rank-structured approaches. On the other hand, models in data science that are formulated as a composition of functions, lead to a Kronecker product structure that yields a different kind of fast algorithm. In this lecture, we will look at some examples of when rank structure and Kronecker structure can be useful

    Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

    Full text link
    This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date
    • …
    corecore