Search CORE

1,557 research outputs found

Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

Author: Hasanov Khalid
Lastovetsky Alexey
Quintin Jean-Noel
Publication venue
Publication date: 18/06/2013
Field of study

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid 1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon algorithm as it can be used on a non-square number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene-P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores.Comment: 9 page

arXiv.org e-Print Archive

Crossref

The physics of parallel machines

Author: Chan Tony F.
Publication venue
Publication date
Field of study

The idea is considered that architectures for massively parallel computers must be designed to go beyond supporting a particular class of algorithms to supporting the underlying physical processes being modelled. Physical processes modelled by partial differential equations (PDEs) are discussed. Also discussed is the idea that an efficient architecture must go beyond nearest neighbor mesh interconnections and support global and hierarchical communications

NASA Technical Reports Server

Symmetric Tori connected Torus Network

Author: Faisal Faiz Al
Rahman M.M. Hafizur
Publication venue
Publication date: 01/12/2009
Field of study

A Symmetric Tori connected Torus Network (STTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. In this paper, we present the architecture of the STTN, addressing of node, routing of message, and evaluate the static network performance of STTN, TTN, TESH, mesh, and torus networks. It is shown that the STTN possesses several attractive features, including constant degree, small diameter, low cost, small average distance, moderate bisection width, and high fault tolerant performance than that of other conventional and hierarchical interconnection networks

The International Islamic University Malaysia Repository

The Abacus Cosmos: A Suite of Cosmological N-body Simulations

Author: Eisenstein Daniel J.
Ferrer Douglas
Garrison Lehman H.
Pinto Philip A.
Tinker Jeremy L.
Weinberg David H.
Publication venue: 'American Astronomical Society'
Publication date: 23/04/2018
Field of study

We present a public data release of halo catalogs from a suite of 125 cosmological

N

-body simulations from the Abacus project. The simulations span 40

w

CDM cosmologies centered on the Planck 2015 cosmology at two mass resolutions,

4\times 10^{10}\;h^{-1}M_\odot

and

1\times 10^{10}\;h^{-1}M_\odot

, in

1.1\;h^{-1}\mathrm{Gpc}

and

720\;h^{-1}\mathrm{Mpc}

boxes, respectively. The boxes are phase-matched to suppress sample variance and isolate cosmology dependence. Additional volume is available via 16 boxes of fixed cosmology and varied phase; a few boxes of single-parameter excursions from Planck 2015 are also provided. Catalogs spanning

z=1.5

0.1

are available for friends-of-friends and Rockstar halo finders and include particle subsamples. All data products are available at https://lgarrison.github.io/AbacusCosmosComment: 13 pages, 9 figures, 3 tables. Additional figures added for mass resolution convergence tests, and additional redshifts added for existing tests. Matches ApJS accepted versio

arXiv.org e-Print Archive

The University of Arizona

Parallel algorithms for Hough transform

Author: Ozbek Fevzi Oktay
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Programming Model to Develop Supercomputer Combinatorial Solvers

Author: Brown A
Mokhov A
Moore SW
Naylor M
Rast A
Tarawneh G
Thomas DB
Yakovlev A
Publication venue: Proceedings of the International Conference on Parallel Processing Workshops
Publication date: 01/01/2017
Field of study

© 2017 IEEE. Novel architectures for massively parallel machines offer better scalability and the prospect of achieving linear speedup for sizable problems in many domains. The development of suitable programming models and accompanying software tools for these architectures remains one of the biggest challenges towards exploiting their full potential. We present a multi-layer software abstraction model to develop combinatorial solvers on massively-parallel machines with regular topologies. The model enables different challenges in the design and optimization of combinatorial solvers to be tackled independently (separation of concerns) while permitting problem-specific tuning and cross-layer optimization. In specific, the model decouples the issues of inter-node communication, n ode-level scheduling, problem mapping, mesh-level load balancing and expressing problem logic. We present an implementation of the model and use it to profile a Boolean satisfiability solver on simulated massively-parallel machines with different scales and topologies

Crossref

Southampton (e-Prints Soton)

Apollo (Cambridge)