56 research outputs found
Understanding Soft Errors in Uncore Components
The effects of soft errors in processor cores have been widely studied.
However, little has been published about soft errors in uncore components, such
as memory subsystem and I/O controllers, of a System-on-a-Chip (SoC). In this
work, we study how soft errors in uncore components affect system-level
behaviors. We have created a new mixed-mode simulation platform that combines
simulators at two different levels of abstraction, and achieves 20,000x speedup
over RTL-only simulation. Using this platform, we present the first study of
the system-level impact of soft errors inside various uncore components of a
large-scale, multi-core SoC using the industrial-grade, open-source OpenSPARC
T2 SoC design. Our results show that soft errors in uncore components can
significantly impact system-level reliability. We also demonstrate that uncore
soft errors can create major challenges for traditional system-level checkpoint
recovery techniques. To overcome such recovery challenges, we present a new
replay recovery technique for uncore components belonging to the memory
subsystem. For the L2 cache controller and the DRAM controller components of
OpenSPARC T2, our new technique reduces the probability that an application run
fails to produce correct results due to soft errors by more than 100x with
3.32% and 6.09% chip-level area and power impact, respectively.Comment: to be published in Proceedings of the 52nd Annual Design Automation
Conferenc
Master of Science
thesisTo minimize resource consumption and maximize performance, computer architecture research has been investigating approaches that may compute inaccurate solutions. Such hardware inaccuracies may induce a wide variety of program behaviors which are not obs
Fault- and Yield-Aware On-Chip Memory Design and Management
Ever decreasing device size causes more frequent hard faults, which becomes a serious burden to processor design and yield management. This problem is particularly pronounced in the on-chip memory which consumes up to 70% of a processor' s total chip area. Traditional circuit-level techniques, such as redundancy and error correction code, become less effective in error-prevalent environments because of their large area overhead. In this work, we suggest an architectural solution to building reliable on-chip memory in the future processor environment. Our approaches have two parts, a design framework and architectural techniques for on-chip memory structures. Our design framework provides important architectural evaluation metrics such as yield, area, and performance based on low level defects and process variations parameters. Processor architects can quickly evaluate their designs' characteristics in terms of yield, area, and performance. With the framework, we develop architectural yield enhancement solutions for on-chip memory structures including L1 cache, L2 cache and directory memory. Our proposed solutions greatly improve yield with negligible area and performance overhead. Furthermore, we develop a decoupled yield model of compute cores and L2 caches in CMPs, which show that there will be many more L2 caches than compute cores in a chip. We propose efficient utilization techniques for excess caches. Evaluation results show that excess caches significantly improve overall performance of CMPs
E-QED: Electrical Bug Localization During Post-Silicon Validation Enabled by Quick Error Detection and Formal Methods
During post-silicon validation, manufactured integrated circuits are
extensively tested in actual system environments to detect design bugs. Bug
localization involves identification of a bug trace (a sequence of inputs that
activates and detects the bug) and a hardware design block where the bug is
located. Existing bug localization practices during post-silicon validation are
mostly manual and ad hoc, and, hence, extremely expensive and time consuming.
This is particularly true for subtle electrical bugs caused by unexpected
interactions between a design and its electrical state. We present E-QED, a new
approach that automatically localizes electrical bugs during post-silicon
validation. Our results on the OpenSPARC T2, an open-source
500-million-transistor multicore chip design, demonstrate the effectiveness and
practicality of E-QED: starting with a failed post-silicon test, in a few hours
(9 hours on average) we can automatically narrow the location of the bug to
(the fan-in logic cone of) a handful of candidate flip-flops (18 flip-flops on
average for a design with ~ 1 Million flip-flops) and also obtain the
corresponding bug trace. The area impact of E-QED is ~2.5%. In contrast,
deter-mining this same information might take weeks (or even months) of mostly
manual work using traditional approaches
Enhancing Processor Design Obfuscation Through Security-Aware On-Chip Memory and Data Path Design
A sizable body of work has identified the importance of architecture and application level security when using logic locking, a family of module level supply chain security techniques, to secure processor ICs. However, prior logic locking research proposes configuring logic locking using only module level considerations. To begin our work, we perform a systematic design space exploration of logic locking in modules throughout a processor IC. This exploration shows that locking with only module level considerations cannot guarantee architecture/application level security, regardless of the locking technique used. To remedy this, we propose a tool-driven security-aware approach to enhance the 2 most effective candidate locking locations, on-chip memory and data path. We show that through minor design modifications of the on-chip memory and data path architecture, one can exponentially improve the architecture/application level security of prior locking art with only a modest design overhead. Underlying our design space exploration and security-aware design approach is ObfusGEM, an open-source logic locking simulation framework released with this work to quantitatively evaluate the architectural effectiveness of logic locking in custom processor architecture configurations
Effetti dei Soft error nei circuiti integrati
Introduzione ai soft error esplorandone origini, effetti e la loro mitigazione nella moderna industria elettronica. La tesi conclude con l’analisi di una serie di test eseguiti sull’architettura IBM POWER6 (microprocessore e I/O hub)
- …