1,403 research outputs found
A combinatorial method for the evaluation of yield of fault-tolerant systems-on-chip
In this paper we develop a combinatorial method for the evaluation of yield of fault-tolerant systems-on-chip. The method assumes that defects are produced according to a model in which defects are lethal and affect given components of the system following a distribution common to all defects. The distribution of the number of defects is arbitrary.
The method is based on the formulation of the yield as 1 minus the probability that a given boolean function with multiple-valued variables has value 1. That probability is computed by analyzing a ROMDD (reduced ordered multiple-valuedecision diagram) representation of the function.
For efficiency reasons, we first build a coded ROBDD (reduced ordered binary decision diagram) representation of the function and then transform that coded ROBDD into the ROMDD required by the method. We present numerical experiments showing that the method is able to cope with quite large systems in moderate CPU times.Postprint (published version
Combinatorial methods for the evaluation of yield and operational reliability of fault-tolerant systems-on-chip
In this paper we develop combinatorial methods for the evaluation of yield and operational reliability of fault-tolerant systems-on-chip. The method for yield computation assumes that defects are produced according to a model in which defects are lethal and affect given components of the system following a distribution common to all defects; the method for the computation of operational reliability also assumes that the fault-tree function of the system is increasing.
The distribution of the number of defects is arbitrary. The methods are based on the formulation of, respectively, the yield and the operational reliability as the probability that a given boolean function with multiple-valued variables has value 1. That probability is computed by analyzing
a ROMDD (reduced ordered multiple-value decision diagram) representation of the function.
For efficiency reasons, a coded ROBDD (reduced ordered binary decision diagram) representation of the function is built first and, then, that coded ROBDD is transformed into the ROMDD required by the methods. We present numerical experiments showing that the methods are able to cope with quite large systems in moderate CPU times.Postprint (published version
A Simplified Method to Calculate Failure Times in Fault-Tolerant Systems
A simplified method is presented to calculate moments of failure time and residual lifetime of a fault-tolerant system. The method is based on recent results in queueing theory. Its effectiveness is illustrated by considering a dual repairable system from the literature
Space Station Freedom data management system growth and evolution report
The Information Sciences Division at the NASA Ames Research Center has completed a 6-month study of portions of the Space Station Freedom Data Management System (DMS). This study looked at the present capabilities and future growth potential of the DMS, and the results are documented in this report. Issues have been raised that were discussed with the appropriate Johnson Space Center (JSC) management and Work Package-2 contractor organizations. Areas requiring additional study have been identified and suggestions for long-term upgrades have been proposed. This activity has allowed the Ames personnel to develop a rapport with the JSC civil service and contractor teams that does permit an independent check and balance technique for the DMS
Delay Measurements and Self Characterisation on FPGAs
This thesis examines new timing measurement methods for self delay characterisation of Field-Programmable Gate Arrays (FPGAs) components and delay measurement of complex circuits
on FPGAs. Two novel measurement techniques based on analysis of a circuit's output failure
rate and transition probability is proposed for accurate, precise and efficient measurement of
propagation delays. The transition probability based method is especially attractive, since
it requires no modifications in the circuit-under-test and requires little hardware resources,
making it an ideal method for physical delay analysis of FPGA circuits.
The relentless advancements in process technology has led to smaller and denser transistors
in integrated circuits. While FPGA users benefit from this in terms of increased hardware
resources for more complex designs, the actual productivity with FPGA in terms of timing
performance (operating frequency, latency and throughput) has lagged behind the potential
improvements from the improved technology due to delay variability in FPGA components
and the inaccuracy of timing models used in FPGA timing analysis. The ability to measure
delay of any arbitrary circuit on FPGA offers many opportunities for on-chip characterisation
and physical timing analysis, allowing delay variability to be accurately tracked and variation-aware optimisations to be developed, reducing the productivity gap observed in today's FPGA
designs.
The measurement techniques are developed into complete self measurement and characterisation platforms in this thesis, demonstrating their practical uses in actual FPGA hardware for
cross-chip delay characterisation and accurate delay measurement of both complex combinatorial and sequential circuits, further reinforcing their positions in solving the delay variability
problem in FPGAs
inSense: A Variation and Fault Tolerant Architecture for Nanoscale Devices
Transistor technology scaling has been the driving force in improving the size, speed, and power consumption of digital systems. As devices approach atomic size, however, their reliability and performance are increasingly compromised due to reduced noise margins, difficulties in fabrication, and emergent nano-scale phenomena. Scaled CMOS devices, in particular, suffer from process variations such as random dopant fluctuation (RDF) and line edge roughness (LER), transistor degradation mechanisms such as negative-bias temperature instability (NBTI) and hot-carrier injection (HCI), and increased sensitivity to single event upsets (SEUs). Consequently, future devices may exhibit reduced performance, diminished lifetimes, and poor reliability.
This research proposes a variation and fault tolerant architecture, the inSense architecture, as a circuit-level solution to the problems induced by the aforementioned phenomena. The inSense architecture entails augmenting circuits with introspective and sensory capabilities which are able to dynamically detect and compensate for process variations, transistor degradation, and soft errors. This approach creates ``smart\u27\u27 circuits able to function despite the use of unreliable devices and is applicable to current CMOS technology as well as next-generation devices using new materials and structures. Furthermore, this work presents an automated prototype implementation of the inSense architecture targeted to CMOS devices and is evaluated via implementation in ISCAS \u2785 benchmark circuits. The automated prototype implementation is functionally verified and characterized: it is found that error detection capability (with error windows from 30-400ps) can be added for less than 2\% area overhead for circuits of non-trivial complexity. Single event transient (SET) detection capability (configurable with target set-points) is found to be functional, although it generally tracks the standard DMR implementation with respect to overheads
Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment
A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control
Reliability analysis of triple modular redundancy system with spare
Hardware redundant fault-tolerant systems and the different design approaches are discussed. The reliability analysis of fault-tolerant systems is usually done under permanent fault conditions. With statistical data suggesting that up to 90% of system failures are caused by intermittent faults, the reliability analysis of fault-tolerant systems must concentrate more on this class of faults. In this work, a reconfigurable Triple Modular Redundancy (TMR) with spare system that differentiates between permanent and intermittent faults has been built. The reconfiguration process of this system depends on both the current status of its modules and their history. Based on this, a different approach for reliability analysis under intermittent fault conditions using Markov models is presented. This approach shows a much higher system reliability compared to other redundant and non-redundant configurations
Quafu-Qcover: Explore Combinatorial Optimization Problems on Cloud-based Quantum Computers
We present Quafu-Qcover, an open-source cloud-based software package designed
for combinatorial optimization problems that support both quantum simulators
and hardware backends. Quafu-Qcover provides a standardized and complete
workflow for solving combinatorial optimization problems using the Quantum
Approximate Optimization Algorithm (QAOA). It enables the automatic modeling of
the original problem as a quadratic unconstrained binary optimization (QUBO)
model and corresponding Ising model, which can be further transformed into a
weight graph. The core of Qcover relies on a graph decomposition-based
classical algorithm, which enables obtaining the optimal parameters for the
shallow QAOA circuit more efficiently. Quafu-Qcover includes a specialized
compiler that translates QAOA circuits into physical quantum circuits capable
of execution on Quafu cloud quantum computers. Compared to a general-purpose
compiler, ours generates shorter circuit depths while also possessing better
speed performance. The Qcover compiler can establish a library of qubits
coupling substructures in real time based on the updated calibration data of
the superconducting quantum devices, ensuring that the task is executed on
physical qubits with higher fidelity. The Quafu-Qcover allows us to retrieve
quantum computer sampling result information at any time using task ID,
enabling asynchronous processing. Besides, it includes modules for result
preprocessing and visualization, allowing for an intuitive display of the
solution to combinatorial optimization problems. We hope that Quafu-Qcover can
serve as a guiding example for how to explore application problems on the Quafu
cloud quantum computersComment: Comments are welcome
Using Fine Grain Approaches for highly reliable Design of FPGA-based Systems in Space
Nowadays using SRAM based FPGAs in space missions is increasingly considered due to their flexibility and reprogrammability. A challenge is the devices sensitivity to radiation effects that increased with modern architectures due to smaller CMOS structures. This work proposes fault tolerance methodologies, that are based on a fine grain view to modern reconfigurable architectures. The focus is on SEU mitigation challenges in SRAM based FPGAs which can result in crucial situations
- …