Search CORE

106 research outputs found

On the acceleration of wavefront applications using distributed many-core architectures

Author: Hammond Simon D.
Jarvis Stephen A.
Mudalige Gihan R.
Pennycook Simon J.
Wright Steven A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2012
Field of study

In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

CiteSeerX

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

White Rose Research Online

Molecular dynamics beyonds the limits: massive scaling on 72 racks of a BlueGene/P and supercooled glass transition of a 1 billion particles system

Author: Andrea Fratalocchi
Angelani
Angell
Beazley
Bell
Bohmer
Cavagna
Debenedetti
Elliot
Frenkel
Giancarlo Ruocco
Griebel
Hedges
Kadau
Knopp
Marx
Matsumoto
Monacoa
Mézard
Nicholas Allsopp
Rapaport
Rosenfeld
Ruocco
Sette
Shintani
Tarjus
Publication venue: 'Elsevier BV'
Publication date: 27/05/2011
Field of study

We report scaling results on the world's largest supercomputer of our recently developed Billions-Body Molecular Dynamics (BBMD) package, which was especially designed for massively parallel simulations of the atomic dynamics in structural glasses and amorphous materials. The code was able to scale up to 72 racks of an IBM BlueGene/P, with a measured 89% efficiency for a system with 100 billion particles. The code speed, with less than 0.14 seconds per iteration in the case of 1 billion particles, paves the way to the study of billion-body structural glasses with a resolution increase of two orders of magnitude with respect to the largest simulation ever reported. We demonstrate the effectiveness of our code by studying the liquid-glass transition of an exceptionally large system made by a binary mixture of 1 billion particles.Comment: 14 pages, 8 figures, submitted to Journal of Computational Physic

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Computing for Perturbative QCD - A Snowmass White Paper

Author: /Argonne
/Fermilab
/LBNL Berkeley
/SLAC
/SLAC
/UCLA
Bauer Christian
Bern Zvi
Boughezal Radja
Campbell John
Christensen Neil
Dixon Lance
Gehrmann Thomas
Hoeche Stefan
Kanzaki Junichi
Mitov Alexander
Nadolsky Pavel
Olness Fredrick
Peskin Michael
Petriello Frank
Pittsburgh /U.
Pozzorini Stefano
Reina Laura
Siegert Frank
Wackeroth Doreen
Walsh Jonathan
Williams Ciaran
Wobisch Markus
Zurich /U.
Publication venue
Publication date: 13/09/2013
Field of study

We present a study on high-performance computing and large-scale distributed computing for perturbative QCD calculations.Comment: 21 pages, 5 table

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server

Towards Loosely-Coupled Programming on Petascale Systems

Author: Beckman Pete
Clifford Ben
Foster Ian
Iskra Kamil
Raicu Ioan
Wilde Mike
Zhang Zhao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2008
Field of study

We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

arXiv.org e-Print Archive

Crossref

2006 Computation Directorate Annual Report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

On the Potential of NoC Virtualization for Multicore Chips

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

Progress Towards Petascale Applications in Biology: Status in 2006

Author: Lingwall Malinda
Mueller Matthias
Stewart Craig A.
Publication venue: Springer Verlag
Publication date: 01/01/2007
Field of study

Petascale computing is currently a common topic of discussion in the high performance computing community. Biological applications, particularly protein folding, are often given as examples of the need for petascale computing. There are at present biological applications that scale to execution rates of approximately 55 teraflops on a special-purpose supercomputer and 2.2 teraflops on a general-purpose supercomputer. In comparison, Qbox, a molecular dynamics code used to model metals, has an achieved performance of 207.3 teraflops. It may be useful to increase the extent to which operation rates and total calculations are reported in discussion of biological applications, and use total operations (integer and floating point combined) rather than (or in addition to) floating point operations as the unit of measure. Increased reporting of such metrics will enable better tracking of progress as the research community strives for the insights that will be enabled by petascale computing.This research was supported in part by the Indiana Genomics Initiative and the Indiana Metabolomics and Cytomics Initiative. The Indiana Genomics Initiative of Indiana University and the Indiana Metabolomics and Cytomics Initiative of Indiana University are supported in part by Lilly Endowment, Inc. The authors also wish to thank IBM, Inc. for support via Shared University Research Grants and partnerships via IU’s relationship as an IBM Life Sciences Institute of Innovation. Indiana University also thanks the TeraGrid partners; IU’s participation in the TeraGrid is funded by National Science Foundation grant numbers 0338618, 0504075, and 0451237. The early development of this paper was supported by a Fulbright Senior Scholars award from the Council for International Exchange of Scholars (CIES) and the United States Department of State to Dr. Craig A. Stewart; Matthias Mueller and the Technische Universität Dresden were hosts. Many reviewers contributed to the improvement of the ideas expressed in this paper and are gratefully appreciated; Thom Dunning, Robert Germain, Chris Mueller, Jim Phillips, Richard Repasky, Ralph Roskies, and Allan Snavely are thanked particularly for their insights

IUScholarWorks (University of Indiana)

PRODEEDINGS OF RIKEN BNL RESEARCH CENTER WORKSHOP : HIGH PERFORMANCE COMPUTING WITH QCDOC AND BLUEGENE.

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Lessons learned at 208K: Towards debugging millions of cores

Author: Barton P. Miller
Ben Liblit
Bronis R. De Supinski
Dong H. Ahn
Dorian C. Arnold
Gregory L. Lee
Martin Schulz
Matthew Legendre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application – already, debugging the full BlueGene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To scale to such counts and beyond, tools must employ a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become a tool bottleneck. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petas-cale. We then present solutions to these challenges that have been implemented and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.

CiteSeerX

Crossref

ISCR Annual Report: Fical Year 2004

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref