30,471 research outputs found
Single-Event Upset Analysis and Protection in High Speed Circuits
The effect of single-event transients (SETs) (at a combinational node of a design) on the system reliability is becoming a big concern for ICs manufactured using advanced technologies. An SET at a node of combinational part may cause a transient pulse at the input of a flip-flop and consequently is latched in the flip-flop and generates a soft-error. When an SET conjoined with a transition at a node along a critical path of the combinational part of a design, a transient delay fault may occur at the input of a flip-flop. On the other hand, increasing pipeline depth and using low power techniques such as multi-level power supply, and multi-threshold transistor convert almost all paths in a circuit to critical ones. Thus, studying the behavior of the SET in these kinds of circuits needs special attention. This paper studies the dynamic behavior of a circuit with massive critical paths in the presence of an SET. We also propose a novel flip-flop architecture to mitigate the effects of such SETs in combinational circuits. Furthermore, the proposed architecture can tolerant a single event upset (SEU) caused by particle strike on the internal nodes of a flip-flo
Experimental analysis of computer system dependability
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance
Integrated analysis of error detection and recovery
An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms
Metastability-Containing Circuits
In digital circuits, metastability can cause deteriorated signals that
neither are logical 0 or logical 1, breaking the abstraction of Boolean logic.
Unfortunately, any way of reading a signal from an unsynchronized clock domain
or performing an analog-to-digital conversion incurs the risk of a metastable
upset; no digital circuit can deterministically avoid, resolve, or detect
metastability (Marino, 1981). Synchronizers, the only traditional
countermeasure, exponentially decrease the odds of maintained metastability
over time. Trading synchronization delay for an increased probability to
resolve metastability to logical 0 or 1, they do not guarantee success.
We propose a fundamentally different approach: It is possible to contain
metastability by fine-grained logical masking so that it cannot infect the
entire circuit. This technique guarantees a limited degree of metastability
in---and uncertainty about---the output.
At the heart of our approach lies a time- and value-discrete model for
metastability in synchronous clocked digital circuits. Metastability is
propagated in a worst-case fashion, allowing to derive deterministic
guarantees, without and unlike synchronizers. The proposed model permits
positive results and passes the test of reproducing Marino's impossibility
results. We fully classify which functions can be computed by circuits with
standard registers. Regarding masking registers, we show that they become
computationally strictly more powerful with each clock cycle, resulting in a
non-trivial hierarchy of computable functions
Cause Clue Clauses: Error Localization using Maximum Satisfiability
Much effort is spent everyday by programmers in trying to reduce long,
failing execution traces to the cause of the error. We present a new algorithm
for error cause localization based on a reduction to the maximal satisfiability
problem (MAX-SAT), which asks what is the maximum number of clauses of a
Boolean formula that can be simultaneously satisfied by an assignment. At an
intuitive level, our algorithm takes as input a program and a failing test, and
comprises the following three steps. First, using symbolic execution, we encode
a trace of a program as a Boolean trace formula which is satisfiable iff the
trace is feasible. Second, for a failing program execution (e.g., one that
violates an assertion or a post-condition), we construct an unsatisfiable
formula by taking the trace formula and additionally asserting that the input
is the failing test and that the assertion condition does hold at the end.
Third, using MAX-SAT, we find a maximal set of clauses in this formula that can
be satisfied together, and output the complement set as a potential cause of
the error. We have implemented our algorithm in a tool called bug-assist for C
programs. We demonstrate the surprising effectiveness of the tool on a set of
benchmark examples with injected faults, and show that in most cases,
bug-assist can quickly and precisely isolate the exact few lines of code whose
change eliminates the error. We also demonstrate how our algorithm can be
modified to automatically suggest fixes for common classes of errors such as
off-by-one.Comment: The pre-alpha version of the tool can be downloaded from
http://bugassist.mpi-sws.or
- …