4,830 research outputs found
Fault-tolerant computer study
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed
Redundant Logic Insertion and Fault Tolerance Improvement in Combinational Circuits
This paper presents a novel method to identify and insert redundant logic
into a combinational circuit to improve its fault tolerance without having to
replicate the entire circuit as is the case with conventional redundancy
techniques. In this context, it is discussed how to estimate the fault masking
capability of a combinational circuit using the truth-cum-fault enumeration
table, and then it is shown how to identify the logic that can introduced to
add redundancy into the original circuit without affecting its native
functionality and with the aim of improving its fault tolerance though this
would involve some trade-off in the design metrics. However, care should be
taken while introducing redundant logic since redundant logic insertion may
give rise to new internal nodes and faults on those may impact the fault
tolerance of the resulting circuit. The combinational circuit that is
considered and its redundant counterparts are all implemented in semi-custom
design style using a 32/28nm CMOS digital cell library and their respective
design metrics and fault tolerances are compared
Fault-tolerance techniques for hybrid CMOS/nanoarchitecture
The authors propose two fault-tolerance techniques for hybrid CMOS/nanoarchitecture implementing logic functions as look-up tables. The authors compare the efficiency of the proposed techniques with recently reported methods that use single coding schemes in tolerating high fault rates in nanoscale fabrics. Both proposed techniques are based on error correcting codes to tackle different fault rates. In the first technique, the authors implement a combined two-dimensional coding scheme using Hamming and Bose-Chaudhuri-Hocquenghem (BCH) codes to address fault rates greater than 5. In the second technique, Hamming coding is complemented with bad line exclusion technique to tolerate fault rates higher than the first proposed technique (up to 20). The authors have also estimated the improvement that can be achieved in the circuit reliability in the presence of Don-t Care Conditions. The area, latency and energy costs of the proposed techniques were also estimated in the CMOS domain
Modeling and measurement of fault-tolerant multiprocessors
The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented
Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults
In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles’ heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in
implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay
Statistical Reliability Estimation of Microprocessor-Based Systems
What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar
Recommended from our members
IC design for reliability
textAs the feature size of integrated circuits goes down to the nanometer scale,
transient and permanent reliability issues are becoming a significant concern for circuit
designers. Traditionally, the reliability issues were mostly handled at the device level as a
device engineering problem. However, the increasing severity of reliability challenges
and higher error rates due to transient upsets favor higher-level design for reliability
(DFR). In this work, we develop several methods for DFR at the circuit level.
A major source of transient errors is the single event upset (SEU). SEUs are
caused by high-energy particles present in the cosmic rays or emitted by radioactive
contaminants in the chip packaging materials. When these particles hit a N+/P+ depletion
region of an MOS transistor, they may generate a temporary logic fault. Depending on
where the MOS transistor is located and what state the circuit is at, an SEU may result in
a circuit-level error. We analyze SEUs both in combinational logic and memories
(SRAM). For combinational logic circuit, we propose FASER, a Fast Analysis tool of
Soft ERror susceptibility for cell-based designs. The efficiency of FASER is achieved
through its static and vector-less nature. In order to evaluate the impact of SEU on SRAM, a theory for estimating dynamic noise margins is developed analytically. The
results allow predicting the transient error susceptibility of an SRAM cell using a closedform
expression.
Among the many permanent failure mechanisms that include time-dependent
oxide breakdown (TDDB), electro-migration (EM), hot carrier effect (HCE), and
negative bias temperature instability (NBTI), NBTI has recently become important.
Therefore, the main focus of our work is NBTI. NBTI occurs when the gate of PMOS is
negatively biased. The voltage stress across the gate generates interface traps, which
degrade the threshold voltage of PMOS. The degraded PMOS may eventually fail to meet
timing requirement and cause functional errors. NBTI becomes severe at elevated
temperatures. In this dissertation, we propose a NBTI degradation model that takes into
account the temperature variation on the chip and gives the accurate estimation of the
degraded threshold voltage.
In order to account for the degradation of devices, traditional design methods add
guard-bands to ensure that the circuit will function properly during its lifetime. However,
the worst-case based guard-bands lead to significant penalty in performance. In this
dissertation, we propose an effective macromodel-based reliability tracking and
management framework, based on a hybrid network of on-chip sensors, consisting of
temperature sensors and ring oscillators. The model is concerned specifically with NBTIinduced
transistor aging. The key feature of our work, in contrast to the traditional
tracking techniques that rely solely on direct measurement of the increase of threshold
voltage or circuit delay, is an explicit macromodel which maps operating temperature to
circuit degradation (the increase of circuit delay). The macromodel allows for costeffective
tracking of reliability using temperature sensors and is also essential for
enabling the control loop of the reliability management system. The developed methods improve the over-conservatism of the device-level, worstcase
reliability estimation techniques. As the severity of reliability challenges continue to
grow with technology scaling, it will become more important for circuit designers/CAD
tools to be equipped with the developed methods.Electrical and Computer Engineerin
Autonomous spacecraft maintenance study group
A plan to incorporate autonomous spacecraft maintenance (ASM) capabilities into Air Force spacecraft by 1989 is outlined. It includes the successful operation of the spacecraft without ground operator intervention for extended periods of time. Mechanisms, along with a fault tolerant data processing system (including a nonvolatile backup memory) and an autonomous navigation capability, are needed to replace the routine servicing that is presently performed by the ground system. The state of the art fault handling capabilities of various spacecraft and computers are described, and a set conceptual design requirements needed to achieve ASM is established. Implementations for near term technology development needed for an ASM proof of concept demonstration by 1985, and a research agenda addressing long range academic research for an advanced ASM system for 1990s are established
- …