Search CORE

102 research outputs found

Understanding Soft Errors in Uncore Components

Author: A.
D.
DeHon A.
H.
J.
L.
Lilja K.
Loveless T. D.
Mitra S.
N.
P.
P.
S.
Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/05/2015
Field of study

The effects of soft errors in processor cores have been widely studied. However, little has been published about soft errors in uncore components, such as memory subsystem and I/O controllers, of a System-on-a-Chip (SoC). In this work, we study how soft errors in uncore components affect system-level behaviors. We have created a new mixed-mode simulation platform that combines simulators at two different levels of abstraction, and achieves 20,000x speedup over RTL-only simulation. Using this platform, we present the first study of the system-level impact of soft errors inside various uncore components of a large-scale, multi-core SoC using the industrial-grade, open-source OpenSPARC T2 SoC design. Our results show that soft errors in uncore components can significantly impact system-level reliability. We also demonstrate that uncore soft errors can create major challenges for traditional system-level checkpoint recovery techniques. To overcome such recovery challenges, we present a new replay recovery technique for uncore components belonging to the memory subsystem. For the L2 cache controller and the DRAM controller components of OpenSPARC T2, our new technique reduces the probability that an application run fails to produce correct results due to soft errors by more than 100x with 3.32% and 6.09% chip-level area and power impact, respectively.Comment: to be published in Proceedings of the 52nd Annual Design Automation Conferenc

arXiv.org e-Print Archive

Crossref

E-QED: Electrical Bug Localization During Post-Silicon Validation Enabled by Quick Error Detection and Formal Methods

Author: B Vermeulen
D Lin
E Clarke
FM Paula De
HF Ko
M Abramovici
M Dusanapudi
NR Saxena
PH Bardell
PN Sanda
RB Jones
S-B Park
T Larrabee
Publication venue
Publication date: 23/07/2017
Field of study

During post-silicon validation, manufactured integrated circuits are extensively tested in actual system environments to detect design bugs. Bug localization involves identification of a bug trace (a sequence of inputs that activates and detects the bug) and a hardware design block where the bug is located. Existing bug localization practices during post-silicon validation are mostly manual and ad hoc, and, hence, extremely expensive and time consuming. This is particularly true for subtle electrical bugs caused by unexpected interactions between a design and its electrical state. We present E-QED, a new approach that automatically localizes electrical bugs during post-silicon validation. Our results on the OpenSPARC T2, an open-source 500-million-transistor multicore chip design, demonstrate the effectiveness and practicality of E-QED: starting with a failed post-silicon test, in a few hours (9 hours on average) we can automatically narrow the location of the bug to (the fan-in logic cone of) a handful of candidate flip-flops (18 flip-flops on average for a design with ~ 1 Million flip-flops) and also obtain the corresponding bug trace. The area impact of E-QED is ~2.5%. In contrast, deter-mining this same information might take weeks (or even months) of mostly manual work using traditional approaches

arXiv.org e-Print Archive

Crossref

Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks

Author: Bertrán Ramon
Bose Pradip
Buyuktosunoglu Alper
González Tallada Marc
Gupta Meeta S.
Publication venue
Publication date: 01/01/2012
Field of study

Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon modeling and/or diagnostic postsilicon measurement based analysis are increasingly cumbersome and error prone. This is especially true of energy-related characterization studies. In this paper, we take the position that automated micro-benchmarks generated with particular objectives in mind hold the key to obtaining accurate energy-related characterization. As such, we first present a flexible micro-benchmark generation framework (MicroProbe) that is used to probe complex multi-core/multi-threaded systems with a variety and range of energy-related queries in mind. We then present experimental results centered around an IBM POWER7 CMP/SMT system to demonstrate how the systematically generated micro-benchmarks can be used to answer three specific queries: (a) How to project application-specific (and if needed, phase-specific) power consumption with component-wise breakdowns? (b) How to measure energy-per-instruction (EPI) values for the target machine? (c) How to bound the worst-case (maximum) power consumption in order to determine safe, but practical (i.e. affordable) packaging or cooling solutions? The solution approaches to the above problems are all new. Hardware measurement based analysis shows superior power projection accuracy (with error margins of less than 2.3% across SPEC CPU2006) as well as max-power stressing capability (with 10.7% increase in processor power over the very worst-case power seen during the execution of SPEC CPU2006 applications).Peer ReviewedPostprint (author’s final draft

CiteSeerX

Crossref

UPCommons. Portal del coneixement obert de la UPC

Energy Efficient Servers

Author: Gough Corey
Saunders Winston
Steiner Ian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computer scienc

Directory of Open Access Books (DOAB)

Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft

Author: Eitzinger Jan
Hager Georg
Hammer Julian
Wellein Gerhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Analytic performance models are essential for understanding the performance characteristics of loop kernels, which consume a major part of CPU cycles in computational science. Starting from a validated performance model one can infer the relevant hardware bottlenecks and promising optimization opportunities. Unfortunately, analytic performance modeling is often tedious even for experienced developers since it requires in-depth knowledge about the hardware and how it interacts with the software. We present the "Kerncraft" tool, which eases the construction of analytic performance models for streaming kernels and stencil loop nests. Starting from the loop source code, the problem size, and a description of the underlying hardware, Kerncraft can ideally predict the single-core performance and scaling behavior of loops on multicore processors using the Roofline or the Execution-Cache-Memory (ECM) model. We describe the operating principles of Kerncraft with its capabilities and limitations, and we show how it may be used to quickly gain insights by accelerated analytic modeling.Comment: 11 pages, 4 figures, 8 listing

arXiv.org e-Print Archive

Crossref

Energy Efficient Servers

Author: Gough Corey
Saunders Winston
Steiner Ian
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Computer scienc

OAPEN Library