3,587 research outputs found
Machine Learning to Tackle the Challenges of Transient and Soft Errors in Complex Circuits
The Functional Failure Rate analysis of today's complex circuits is a
difficult task and requires a significant investment in terms of human efforts,
processing resources and tool licenses. Thereby, de-rating or vulnerability
factors are a major instrument of failure analysis efforts. Usually
computationally intensive fault-injection simulation campaigns are required to
obtain a fine-grained reliability metrics for the functional level. Therefore,
the use of machine learning algorithms to assist this procedure and thus,
optimising and enhancing fault injection efforts, is investigated in this
paper. Specifically, machine learning models are used to predict accurate
per-instance Functional De-Rating data for the full list of circuit instances,
an objective that is difficult to reach using classical methods. The described
methodology uses a set of per-instance features, extracted through an analysis
approach, combining static elements (cell properties, circuit structure,
synthesis attributes) and dynamic elements (signal activity). Reference data is
obtained through first-principles fault simulation approaches. One part of this
reference dataset is used to train the machine learning model and the remaining
is used to validate and benchmark the accuracy of the trained tool. The
presented methodology is applied on a practical example and various machine
learning models are evaluated and compared
Machine Learning Clustering Techniques for Selective Mitigation of Critical Design Features
Selective mitigation or selective hardening is an effective technique to
obtain a good trade-off between the improvements in the overall reliability of
a circuit and the hardware overhead induced by the hardening techniques.
Selective mitigation relies on preferentially protecting circuit instances
according to their susceptibility and criticality. However, ranking circuit
parts in terms of vulnerability usually requires computationally intensive
fault-injection simulation campaigns. This paper presents a new methodology
which uses machine learning clustering techniques to group flip-flops with
similar expected contributions to the overall functional failure rate, based on
the analysis of a compact set of features combining attributes from static
elements and dynamic elements. Fault simulation campaigns can then be executed
on a per-group basis, significantly reducing the time and cost of the
evaluation. The effectiveness of grouping similar sensitive flip-flops by
machine learning clustering algorithms is evaluated on a practical
example.Different clustering algorithms are applied and the results are
compared to an ideal selective mitigation obtained by exhaustive
fault-injection simulation
Understanding multidimensional verification: Where functional meets non-functional
Abstract Advancements in electronic systems' design have a notable impact on design verification technologies. The recent paradigms of Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) assume devices immersed in physical environments, significantly constrained in resources and expected to provide levels of security, privacy, reliability, performance and low-power features. In recent years, numerous extra-functional aspects of electronic systems were brought to the front and imply verification of hardware design models in multidimensional space along with the functional concerns of the target system. However, different from the software domain such a holistic approach remains underdeveloped. The contributions of this paper are a taxonomy for multidimensional hardware verification aspects, a state-of-the-art survey of related research works and trends enabling the multidimensional verification concept. Further, an initial approach to perform multidimensional verification based on machine learning techniques is evaluated. The importance and challenge of performing multidimensional verification is illustrated by an example case study
Dependable Computing on Inexact Hardware through Anomaly Detection.
Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive voltage scaling is making the problem even worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to continue to reap the benefits of technology scaling, it has become imperative to tackle the challenges risen due to the decreasing reliability of devices for the mainstream commodity market. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling is increasingly providing diminishing marginal returns. More than any other time in history, the semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency.
These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency.
For traditional applications, I propose an efficient, software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of the data flow solution lies in the use of dynamic application information such as control flow, memory and value profiling. The control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique. The quality control technique employs continuous, light-weight checkers to ensure that the approximation is controlled and application output is acceptable. Overall, I show that the use of low-cost checkers to produce dependable results on commodity systems---constructed from inexact hardware components---is efficient and practical.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113341/1/dskhudia_1.pd
X-Rel: Energy-Efficient and Low-Overhead Approximate Reliability Framework for Error-Tolerant Applications Deployed in Critical Systems
Triple Modular Redundancy (TMR) is one of the most common techniques in
fault-tolerant systems, in which the output is determined by a majority voter.
However, the design diversity of replicated modules and/or soft errors that are
more likely to happen in the nanoscale era may affect the majority voting
scheme. Besides, the significant overheads of the TMR scheme may limit its
usage in energy consumption and area-constrained critical systems. However, for
most inherently error-resilient applications such as image processing and
vision deployed in critical systems (like autonomous vehicles and robotics),
achieving a given level of reliability has more priority than precise results.
Therefore, these applications can benefit from the approximate computing
paradigm to achieve higher energy efficiency and a lower area. This paper
proposes an energy-efficient approximate reliability (X-Rel) framework to
overcome the aforementioned challenges of the TMR systems and get the full
potential of approximate computing without sacrificing the desired reliability
constraint and output quality. The X-Rel framework relies on relaxing the
precision of the voter based on a systematical error bounding method that
leverages user-defined quality and reliability constraints. Afterward, the size
of the achieved voter is used to approximate the TMR modules such that the
overall area and energy consumption are minimized. The effectiveness of
employing the proposed X-Rel technique in a TMR structure, for different
quality constraints as well as with various reliability bounds are evaluated in
a 15-nm FinFET technology. The results of the X-Rel voter show delay, area, and
energy consumption reductions of up to 86%, 87%, and 98%, respectively, when
compared to those of the state-of-the-art approximate TMR voters.Comment: This paper has been published in IEEE Transactions on Very Large
Scale Integration (VLSI) System
Prediction of Solar Particle Events with SRAM-Based Soft Error Rate Monitor and Supervised Machine Learning
This work introduces an embedded approach for the prediction of Solar Particle Events (SPEs) in space applications by combining the real-time Soft Error Rate (SER) measurement with SRAM-based detector and the offline trained machine learning model. The proposed approach is intended for the self-adaptive fault-tolerant multiprocessing systems employed in space applications. With respect to the state-of-the-art, our solution allows for predicting the SER 1 h in advance and fine-grained hourly tracking of SER variations during SPEs as well as under normal conditions. Therefore, the target system can activate the appropriate mechanisms for radiation hardening before the onset of high radiation levels. Based on the comparison of five different machine learning algorithms trained with the public space flux database, the preliminary results indicate that the best prediction accuracy is achieved with the recurrent neural network (RNN) with long short-term memory (LSTM)
- …