1,557 research outputs found

    Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    Full text link
    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid 1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon algorithm as it can be used on a non-square number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene-P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores.Comment: 9 page

    The physics of parallel machines

    Get PDF
    The idea is considered that architectures for massively parallel computers must be designed to go beyond supporting a particular class of algorithms to supporting the underlying physical processes being modelled. Physical processes modelled by partial differential equations (PDEs) are discussed. Also discussed is the idea that an efficient architecture must go beyond nearest neighbor mesh interconnections and support global and hierarchical communications

    Symmetric Tori connected Torus Network

    Get PDF
    A Symmetric Tori connected Torus Network (STTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. In this paper, we present the architecture of the STTN, addressing of node, routing of message, and evaluate the static network performance of STTN, TTN, TESH, mesh, and torus networks. It is shown that the STTN possesses several attractive features, including constant degree, small diameter, low cost, small average distance, moderate bisection width, and high fault tolerant performance than that of other conventional and hierarchical interconnection networks

    The Abacus Cosmos: A Suite of Cosmological N-body Simulations

    Full text link
    We present a public data release of halo catalogs from a suite of 125 cosmological NN-body simulations from the Abacus project. The simulations span 40 wwCDM cosmologies centered on the Planck 2015 cosmology at two mass resolutions, 4×1010  h1M4\times 10^{10}\;h^{-1}M_\odot and 1×1010  h1M1\times 10^{10}\;h^{-1}M_\odot, in 1.1  h1Gpc1.1\;h^{-1}\mathrm{Gpc} and 720  h1Mpc720\;h^{-1}\mathrm{Mpc} boxes, respectively. The boxes are phase-matched to suppress sample variance and isolate cosmology dependence. Additional volume is available via 16 boxes of fixed cosmology and varied phase; a few boxes of single-parameter excursions from Planck 2015 are also provided. Catalogs spanning z=1.5z=1.5 to 0.10.1 are available for friends-of-friends and Rockstar halo finders and include particle subsamples. All data products are available at https://lgarrison.github.io/AbacusCosmosComment: 13 pages, 9 figures, 3 tables. Additional figures added for mass resolution convergence tests, and additional redshifts added for existing tests. Matches ApJS accepted versio

    Parallel algorithms for Hough transform

    Get PDF

    Programming Model to Develop Supercomputer Combinatorial Solvers

    Get PDF
    © 2017 IEEE. Novel architectures for massively parallel machines offer better scalability and the prospect of achieving linear speedup for sizable problems in many domains. The development of suitable programming models and accompanying software tools for these architectures remains one of the biggest challenges towards exploiting their full potential. We present a multi-layer software abstraction model to develop combinatorial solvers on massively-parallel machines with regular topologies. The model enables different challenges in the design and optimization of combinatorial solvers to be tackled independently (separation of concerns) while permitting problem-specific tuning and cross-layer optimization. In specific, the model decouples the issues of inter-node communication, n ode-level scheduling, problem mapping, mesh-level load balancing and expressing problem logic. We present an implementation of the model and use it to profile a Boolean satisfiability solver on simulated massively-parallel machines with different scales and topologies
    corecore