Search CORE

58 research outputs found

HPCC Update and Analysis

Author: Jeffery A Kuehn
Nathan L Wichmann
Publication venue
Publication date: 11/04/2020
Field of study

Abstract: The last year has seen significant updates in the programming environment and operating systems on the Cray X1E and Cray XT3 as well as the much anticipated release of version 1.0 of HPCC Benchmark. This paper will provide an update and analysis of the HPCC Benchmark Results for Cray XT3 and X1E as well as a comparison against historical results

CiteSeerX

Results and Frontiers in Lattice Baryon Spectroscopy

Author: Adam C. Lichtl
Chaden Djalali
Colin Morningstar
David Richards
Fernando Umeres
George Fleming
John Bulava
K. Jimmy Juge
Nilmani Mathur
Philip L. Cole
Ricardo Alarcon
Robert Edwards
Stephen J. Wallace
Publication venue: 'AIP Publishing'
Publication date: 01/01/2007
Field of study

The Lattice Hadron Physics Collaboration (LHPC) baryon spectroscopy effort is reviewed. To date the LHPC has performed exploratory Lattice QCD calculations of the low-lying spectrum of Nucleon and Delta baryons. These calculations demonstrate the effectiveness of our method by obtaining the masses of an unprecedented number of excited states with definite quantum numbers. Future work of the project is outlined.Comment: To appear in the proceedings for the VII Latin American Symposium of Nuclear Physics and Application

arXiv.org e-Print Archive

Crossref

Recommended from our members

Performance Engineering in the Community Atmosphere Model

Author: Drake J.
Mirin A.
Sawyer W.
Worley P.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 30/05/2006
Field of study

The Community Atmosphere Model (CAM) is the atmospheric component of the Community Climate System Model (CCSM) and is the primary consumer of computer resources in typical CCSM simulations. Performance engineering has been an important aspect of CAM development throughout its existence. This paper briefly summarizes these efforts and their impacts over the past five years

UNT Digital Library

Performance Measurement and Analysis of Large-Scale Parallel Applications on Leadership Computing Systems

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

Crossref

Trilinos I/O Support (Trios)

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Crossref

Predictive analysis and optimisation of pipelined wavefront applications using reusable analytic models

Author: Mudalige Gihan R.
Publication venue
Publication date
Field of study

Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms used for the solution of many scientific and engineering applications. In order to aid the design and optimisation of these applications, and to ensure that during procurement platforms are chosen best suited to these codes, there has been considerable research in analysing and evaluating their operational performance. Wavefront codes exhibit complex computation, communication, synchronisation patterns, and as a result there exist a large variety of such codes and possible optimisations. The problem is compounded by each new generation of high performance computing system, which has often introduced a previously unexplored architectural trait, requiring previous performance models to be rewritten and reevaluated. In this thesis, we address the performance modelling and optimisation of this class of application, as a whole. This differs from previous studies in which bespoke models are applied to specific applications. The analytic performance models are generalised and reusable, and we demonstrate their application to the predictive analysis and optimisation of pipelined wavefront computations running on modern high performance computing systems. The performance model is based on the LogGP parameterisation, and uses a small number of input parameters to specify the particular behaviour of most wavefront codes. The new parameters and model equations capture the key structural and behavioural differences among different wavefront application codes, providing a succinct summary of the operations for each application and insights into alternative wavefront application design. The models are applied to three industry-strength wavefront codes and are validated on several systems including a Cray XT3/XT4 and an InfiniBand commodity cluster. Model predictions show high quantitative accuracy (less than 20% error) for all high performance configurations and excellent qualitative accuracy. The thesis presents applications, projections and insights for optimisations using the model, which show the utility of reusable analytic models for performance engineering of high performance computing codes. In particular, we demonstrate the use of the model for: (1) evaluating application configuration and resulting performance; (2) evaluating hardware platform issues including platform sizing, configuration; (3) exploring hardware platform design alternatives and system procurement and, (4) considering possible code and algorithmic optimisations

Warwick Research Archives Portal Repository

Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation

Author: Art Mirin
Barry F Smith
Forrest M Hoffman
Glenn E Hammond
Kalyan S Perumalla
Patrick H Worley
Richard T Mills
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT: As the number of processor cores available in Cray XT series computers has rapidly grown, users have increasingly encountered instances where an MPI code that has previously worked for years unexpectedly fails at high core counts ("at scale") due to resource limitations being exceeded within the MPI implementation. Here, we examine several examples drawn from user experiences and discuss strategies for working around these difficulties at the user level

CiteSeerX

Automating Topology Aware Mapping for Supercomputers

Author: Bhatele Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Petascale machines with hundreds of thousands of cores are being built. These machines have varying interconnect topologies and large network diameters. Computation is cheap and communication on the network is becoming the bottleneck for scaling of parallel applications. Network contention, specifically, is becoming an increasingly important factor affecting overall performance. The broad goal of this dissertation is performance optimization of parallel applications through reduction of network contention. Most parallel applications have a certain communication topology. Mapping of tasks in a parallel application based on their communication graph, to the physical processors on a machine can potentially lead to performance improvements. Mapping of the communication graph for an application on to the interconnect topology of a machine while trying to localize communication is the research problem under consideration. The farther different messages travel on the network, greater is the chance of resource sharing between messages. This can create contention on the network for networks commonly used today. Evaluative studies in this dissertation show that on IBM Blue Gene and Cray XT machines, message latencies can be severely affected under contention. Realizing this fact, application developers have started paying attention to the mapping of tasks to physical processors to minimize contention. Placement of communicating tasks on nearby physical processors can minimize the distance traveled by messages and reduce the chances of contention. Performance improvements through topology aware placement for applications such as NAMD and OpenAtom are used to motivate this work. Building on these ideas, the dissertation proposes algorithms and techniques for automatic mapping of parallel applications to relieve the application developers of this burden. The effect of contention on message latencies is studied in depth to guide the design of mapping algorithms. The hop-bytes metric is proposed for the evaluation of mapping algorithms as a better metric than the previously used maximum dilation metric. The main focus of this dissertation is on developing topology aware mapping algorithms for parallel applications with regular and irregular communication patterns. The automatic mapping framework is a suite of such algorithms with capabilities to choose the best mapping for a problem with a given communication graph. The dissertation also briefly discusses completely distributed mapping techniques which will be imperative for machines of the future.published or submitted for publicationnot peer reviewe

CiteSeerX

Illinois Digital Environment for Access to Learning and Scholarship Repository

Interconnect Performance Evaluation of SGI Altix 3700 BX2, Cray X1, Cray Opteron Cluster, and Dell PowerEdge

Author: Ciotti Robert
Fatoohi Rod
Saini Subbash
Publication venue
Publication date
Field of study

We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect

NASA Technical Reports Server