2,746 research outputs found
Redundant Logic Insertion and Fault Tolerance Improvement in Combinational Circuits
This paper presents a novel method to identify and insert redundant logic
into a combinational circuit to improve its fault tolerance without having to
replicate the entire circuit as is the case with conventional redundancy
techniques. In this context, it is discussed how to estimate the fault masking
capability of a combinational circuit using the truth-cum-fault enumeration
table, and then it is shown how to identify the logic that can introduced to
add redundancy into the original circuit without affecting its native
functionality and with the aim of improving its fault tolerance though this
would involve some trade-off in the design metrics. However, care should be
taken while introducing redundant logic since redundant logic insertion may
give rise to new internal nodes and faults on those may impact the fault
tolerance of the resulting circuit. The combinational circuit that is
considered and its redundant counterparts are all implemented in semi-custom
design style using a 32/28nm CMOS digital cell library and their respective
design metrics and fault tolerances are compared
Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study
This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications
Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space
Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites
strongly rely on the reliability of onboard computers to guarantee the success
of their missions. Relying solely on radiation-hardened technologies is
extremely expensive, and developing inflexible architectural and
microarchitectural modifications to introduce modular redundancy within a
system leads to significant area increase and performance degradation. To
mitigate the overheads of traditional radiation hardening and modular
redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR)
approach, a redundancy scheme that features a cluster of RISC-V processors with
a flexible on-demand dual-core and triple-core lockstep grouping of computing
cores with runtime split-lock capabilities. Further, we propose two recovery
approaches, software-based and hardware-based, trading off performance and area
overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1160
MOPS on a matrix multiplication benchmark when configured in non-redundant mode
and 617 and 414 MOPS in dual and triple mode, respectively. A software-based
recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2,
representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster.
As a high-performance alternative, a new hardware-based method provides rapid
fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely ~9.4%
area overhead over the baseline non-redundant RISC-V cluster. The cluster is
also enhanced with split-lock capabilities to enter one of the redundant modes
with minimum performance loss, allowing execution of a mission-critical or a
performance section, with <400 clock cycles overhead for entry and exit. The
proposed system is the first to integrate these functionalities on an
open-source RISC-V-based compute device, enabling finely tunable reliability
vs. performance trade-offs
Redundancy management for efficient fault recovery in NASA's distributed computing system
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance
Recommended from our members
Fault-tolerant hardware designs and their reliability analysis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Fault-tolerance, which is a complement to fault prevention, is an effective method of achieving ultra-high reliability. By taking this approach fault free computation can be achieved despite the presence of fault in the system. In this thesis three new fault tolerant techniques are presented and their advantages over well known fault-tolerant strategies are shown. One of these new techniques achieves higher reliability than any other similar techniques presented in the literature. Generally fault-tolerant structures consist of four major blocks: the replicated modules, the disagreement and detection circuit, the switching circuit, and the voting mechanism. The most critical component in a fault-tolerant system is the voter because the final output of the system is computed by this component. This dissertation presents a new implementation for voters which reduces both the complexity and the occupied area on the chip. The structures of the three techniques developed in this work are such that the complexity of their switching mechanisms grows only linearly with the number of modules but the voting mechanism complexity increases significantly. This is a better approach than those schemes in which the switching complexity increases significantly and the voter's complexity remains constant or grows linearly with the number of modules because it is easier to implement a complex voter than a complex switch (voters have more regular structures). Extensive comparisons are made between different fault-tolerant techniques. A new reliability model is also developed for system reliability evaluation of the new designs. The results of these analyses are plotted, and the advantages of the new techniques are demonstrated. In the final part of the work an expert system is described which uses the knowledge acquired by these comparisons. This expert system is meant as a prototype of a component of a CAD tool which will act as an advisor on fault-tolerant techniques
Robust configurable system design with built-in self-healing
The new generations of SRAM-based FPGA (Field Programmable Gate Array) devices, built on nanometre technology, are the preferred choice for the implementation of reconfigurable computing platforms. However, their vulnerability to hard and soft errors is a major weakness to robust system design based on FPGAs. In this paper, a novel Built-In Self-Healing (BISH) methodology, based on modular redundancy and on selfreconfiguration, is proposed. A soft microprocessor core implemented in the FPGA is responsible for the management and execution of all the BISH procedures. Fault detection and diagnosis is followed by repairing actions, taking advantage of the self-configuration features. Meanwhile, modular redundancy assures that the system still works correctly. This approach leads to a robust system design able to assure high reliability, availability and data integrity
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Restoring Reliability in Fault Tolerant Reconfigurable Systems
The new generations of SRAM-based FPGAdevices, built on nanometer technology, are thepreferred choice for the implementation ofreconfigurable computing platforms. However,smaller technological scales increase theirvulnerability to manufacturing imperfections andhence to the occurrence of electromigration.Moreover, the large internal RAM (for configurationpurposes or as embedded memory blocks) makesthem more prone to soft errors.The incorporation of self-reconfigurationcapabilities in recent FPGAs, allied to the use of softand hard microprocessor cores, facilitates the offsetof these vulnerabilities by enabling the developmentof self-restoring fault tolerant reconfigurablesystems. In the methodology presented in this paper,the embedded microprocessor is also responsible forthe implementation of online self-test-and-repairstrategies, based on modular redundancy and onself-reconfiguration. The detection of faults, causedby soft or hard errors, may be followed by repairingactions, depending on the fault type. This approachleads to smoother system degradation, extending itslifetime and improving its reliability
- …