1,281 research outputs found
Radiation Risks and Mitigation in Electronic Systems
Electrical and electronic systems can be disturbed by radiation-induced
effects. In some cases, radiation-induced effects are of a low probability and
can be ignored; however, radiation effects must be considered when designing
systems that have a high mean time to failure requirement, an impact on
protection, and/or higher exposure to radiation. High-energy physics power
systems suffer from a combination of these effects: a high mean time to failure
is required, failure can impact on protection, and the proximity of systems to
accelerators increases the likelihood of radiation-induced events. This paper
presents the principal radiation-induced effects, and radiation environments
typical to high-energy physics. It outlines a procedure for designing and
validating radiation-tolerant systems using commercial off-the-shelf
components. The paper ends with a worked example of radiation-tolerant power
converter controls that are being developed for the Large Hadron Collider and
High Luminosity-Large Hadron Collider at CERN.Comment: 19 pages, contribution to the 2014 CAS - CERN Accelerator School:
Power Converters, Baden, Switzerland, 7-14 May 201
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
Machine Learning (ML) is making a strong resurgence in tune with the massive
generation of unstructured data which in turn requires massive computational
resources. Due to the inherently compute- and power-intensive structure of
Neural Networks (NNs), hardware accelerators emerge as a promising solution.
However, with technology node scaling below 10nm, hardware accelerators become
more susceptible to faults, which in turn can impact the NN accuracy. In this
paper, we study the resilience aspects of Register-Transfer Level (RTL) model
of NN accelerators, in particular, fault characterization and mitigation. By
following a High-Level Synthesis (HLS) approach, first, we characterize the
vulnerability of various components of RTL NN. We observed that the severity of
faults depends on both i) application-level specifications, i.e., NN data
(inputs, weights, or intermediate), NN layers, and NN activation functions, and
ii) architectural-level specifications, i.e., data representation model and the
parallelism degree of the underlying accelerator. Second, motivated by
characterization results, we present a low-overhead fault mitigation technique
that can efficiently correct bit flips, by 47.3% better than state-of-the-art
methods.Comment: 8 pages, 6 figure
- …