Search CORE

213 research outputs found

Soft Error Vulnerability of Iterative Linear Algebra Methods

Author: Bronevetsky G
de Supinski B
Publication venue: Lawrence Livermore National Laboratory
Publication date: 15/12/2007
Field of study

Devices become increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft errors primarily caused problems for space and high-atmospheric computing applications. Modern architectures now use features so small at sufficiently low voltages that soft errors are becoming significant even at terrestrial altitudes. The soft error vulnerability of iterative linear algebra methods, which many scientific applications use, is a critical aspect of the overall application vulnerability. These methods are often considered invulnerable to many soft errors because they converge from an imprecise solution to a precise one. However, we show that iterative methods can be vulnerable to soft errors, with a high rate of silent data corruptions. We quantify this vulnerability, with algorithms generating up to 8.5% erroneous results when subjected to a single bit-flip. Further, we show that detecting soft errors in an iterative method depends on its detailed convergence properties and requires more complex mechanisms than simply checking the residual. Finally, we explore inexpensive techniques to tolerate soft errors in these methods

CiteSeerX

Crossref

UNT Digital Library

Soft Error Vulnerability of Iterative Linear Algebra Methods

Author: Bronevetsky G
de Supinski B
Publication venue: Lawrence Livermore National Laboratory
Publication date: 01/01/2008
Field of study

Devices are increasingly vulnerable to soft errors as their feature sizes shrink. Previously, soft error rates were significant primarily in space and high-atmospheric computing. Modern architectures now use features so small at sufficiently low voltages that soft errors are becoming important even at terrestrial altitudes. Due to their large number of components, supercomputers are particularly susceptible to soft errors. Since many large scale parallel scientific applications use iterative linear algebra methods, the soft error vulnerability of these methods constitutes a large fraction of the applications overall vulnerability. Many users consider these methods invulnerable to most soft errors since they converge from an imprecise solution to a precise one. However, we show in this paper that iterative methods are vulnerable to soft errors, exhibiting both silent data corruptions and poor ability to detect errors. Further, we evaluate a variety of soft error detection and tolerance techniques, including checkpointing, linear matrix encodings, and residual tracking techniques

Crossref

UNT Digital Library

Recommended from our members

Formal Specification of the OpenMP Memory Model

Author: Bronevetsky G
de Supinski B R
Publication venue: Lawrence Livermore National Laboratory
Publication date: 17/05/2006
Field of study

OpenMP [1] is an important API for shared memory programming, combining shared memory's potential for performance with a simple programming interface. Unfortunately, OpenMP lacks a critical tool for demonstrating whether programs are correct: a formal memory model. Instead, the current official definition of the OpenMP memory model (the OpenMP 2.5 specification [1]) is in terms of informal prose. As a result, it is impossible to verify OpenMP applications formally since the prose does not provide a formal consistency model that precisely describes how reads and writes on different threads interact. This paper focuses on the formal verification of OpenMP programs through a proposed formal memory model that is derived from the existing prose model [1]. Our formalization provides a two-step process to verify whether an observed OpenMP execution is conformant. In addition to this formalization, our contributions include a discussion of ambiguities in the current prose-based memory model description. Although our formal model may not capture the current informal memory model perfectly, in part due to these ambiguities, our model reflects our understanding of the informal model's intent. We conclude with several examples that may indicate areas of the OpenMP memory model that need further refinement however it is specified. Our goal is to motivate the OpenMP community to adopt those refinements eventually, ideally through a formal model, in later OpenMP specifications

UNT Digital Library

CLOMP: Accurately Characterizing OpenMP Application Overheads

Author: Bronevetsky G
de Supinski B
Gyllenhaal J
Publication venue: Lawrence Livermore National Laboratory
Publication date: 01/01/2008
Field of study

Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to understand measurements of OpenMP overheads in the context of application usage scenarios. Our results for several OpenMP implementations demonstrate that CLOMP identifies the amount of work required to compensate for the overheads observed with EPCC. Further, we show that CLOMP also captures limitations for OpenMP parallelization on NUMA systems

CiteSeerX

Crossref

Springer - Publisher Connector

eScholarship - University of California

UNT Digital Library

Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System

Author: Bronevetsky G
de Supinski B R
Mohror K M
Moody A T
Publication venue: Lawrence Livermore National Laboratory
Publication date: 09/04/2010
Field of study

High-performance computing (HPC) systems are growing more powerful by utilizing more hardware components. As the system mean-time-before-failure correspondingly drops, applications must checkpoint more frequently to make progress. However, as the system memory sizes grow faster than the bandwidth to the parallel file system, the cost of checkpointing begins to dominate application run times. A potential solution to this problem is to use multi-level checkpointing, which employs multiple types of checkpoints with different costs and different levels of resiliency in a single run. The goal is to design light-weight checkpoints to handle the most common failure modes and rely on more expensive checkpoints for less common, but more severe failures. While this approach is theoretically promising, it has not been fully evaluated in a large-scale, production system context. To this end we have designed a system, called the Scalable Checkpoint/Restart (SCR) library, that writes checkpoints to storage on the compute nodes utilizing RAM, Flash, or disk, in addition to the parallel file system. We present the performance and reliability properties of SCR as well as a probabilistic Markov model that predicts its performance on current and future systems. We show that multi-level checkpointing improves efficiency on existing large-scale systems and that this benefit increases as the system size grows. In particular, we developed low-cost checkpoint schemes that are 100x-1000x faster than the parallel file system and effective against 85% of our system failures. This leads to a gain in machine efficiency of up to 35%, and it reduces the the load on the parallel file system by a factor of two on current and future systems

Crossref

UNT Digital Library

Toward Enhancing OpenMP's Work-Sharing Directives

Author: Chapman B M
de Supinski B R
Huang L
Jin H
Jost G
Publication venue: Lawrence Livermore National Laboratory
Publication date: 01/01/2006
Field of study

OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming

Crossref

UNT Digital Library

Recommended from our members

Quantifying the Effectiveness of Load Balance Algorithms

Author: Amato N. M.
de Supinski B. R.
Gamblin G. T.
Pearce O.
Schulz M.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 17/01/2012
Field of study

UNT Digital Library

Recommended from our members

Scalable Dynamic Instrumentation for BlueGene/L

Author: Ahn D
Bernat A
de Supinski B R
Ko S Y
Lee G
Rountree B
Schulz M
Publication venue: Lawrence Livermore National Laboratory
Publication date: 08/09/2005
Field of study

Dynamic binary instrumentation for performance analysis on new, large scale architectures such as the IBM Blue Gene/L system (BG/L) poses new challenges. Their scale--with potentially hundreds of thousands of compute nodes--requires new, more scalable mechanisms to deploy and to organize binary instrumentation and to collect the resulting data gathered by the inserted probes. Further, many of these new machines don't support full operating systems on the compute nodes; rather, they rely on light-weight custom compute kernels that do not support daemon-based implementations. We describe the design and current status of a new implementation of the DPCL (Dynamic Probe Class Library) API for BG/L. DPCL provides an easy to use layer for dynamic instrumentation on parallel MPI applications based on the DynInst dynamic instrumentation mechanism for sequential platforms. Our work includes modifying DynInst to control instrumentation from remote I/O nodes and porting DPCL's communication to use MRNet, a scalable data reduction network for collecting performance data. We describe extensions to the DPCL API that support instrumentation of task subsets and aggregation of collected performance data. Overall, our implementation provides a scalable infrastructure that provides efficient binary instrumentation on BG/L

UNT Digital Library

Bench-to-bedside review : targeting antioxidants to mitochondria in sepsis

Author: A Dhanasekaran
A Dyson
A Filipovska
AL Hill
AM James
B Bedogni
BE Minter
CA Macias
CC Chuang
CE Cross
D Brealey
D Brealey
D Brealey
D Graham
DA Lowes
DA Lowes
DP Jones
E Borrelli
ED Crouser
ED Crouser
F Arnalich
FN Gellerich
G Escames
G Escames
GF Kelso
GS Supinski
H Bayir
H Bohrer
H Zhang
HC Cowley
Helen F Galley
HF Galley
HF Galley
HF Galley
HF Goode
I Spasojević
I Vanhorebeek
J Ripcke
JC Marshall
JF Turrens
JJ Pandit
JV Esplugues
K Yassen
K Zhao
KS Roser
L Packer
M Qunitero
MD Wheeler
MD Wheeler
MP Fink
MP Fink
MP Murphy
MW Fariss
P Ghafourifar
P Wipf
RAJ Smith
RAJ Smith
RL Paterson
RO Poyton
S González-Rubio
S Rinaldi
SS Sheu
V Mishra
V Vanasco
VJ Adlam
VP Skulachev
W Droge
Z Lacza
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/08/2010
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Crossref

PubMed Central

Recommended from our members

Lessons learned at 208K: Towards Debugging Millions of Cores

Author: Ahn D H
Arnold D C
de Supinski B R
Lee G L
Legendre M
Liblit B
Miller B P
Schulz M J
Publication venue: Lawrence Livermore National Laboratory
Publication date: 14/04/2008
Field of study

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application--already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines

UNT Digital Library