2,746 research outputs found

    Redundant Logic Insertion and Fault Tolerance Improvement in Combinational Circuits

    Full text link
    This paper presents a novel method to identify and insert redundant logic into a combinational circuit to improve its fault tolerance without having to replicate the entire circuit as is the case with conventional redundancy techniques. In this context, it is discussed how to estimate the fault masking capability of a combinational circuit using the truth-cum-fault enumeration table, and then it is shown how to identify the logic that can introduced to add redundancy into the original circuit without affecting its native functionality and with the aim of improving its fault tolerance though this would involve some trade-off in the design metrics. However, care should be taken while introducing redundant logic since redundant logic insertion may give rise to new internal nodes and faults on those may impact the fault tolerance of the resulting circuit. The combinational circuit that is considered and its redundant counterparts are all implemented in semi-custom design style using a 32/28nm CMOS digital cell library and their respective design metrics and fault tolerances are compared

    Improving reconfigurable systems reliability by combining periodical test and redundancy techniques: a case study

    Get PDF
    This paper revises and introduces to the field of reconfigurable computer systems, some traditional techniques used in the fields of fault-tolerance and testing of digital circuits. The target area is that of on-board spacecraft electronics, as this class of application is a good candidate for the use of reconfigurable computing technology. Fault tolerant strategies are used in order for the system to adapt itself to the severe conditions found in space. In addition, the paper describes some problems and possible solutions for the use of reconfigurable components, based on programmable logic, in space applications

    Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

    Full text link
    Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inflexible architectural and microarchitectural modifications to introduce modular redundancy within a system leads to significant area increase and performance degradation. To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities. Further, we propose two recovery approaches, software-based and hardware-based, trading off performance and area overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1160 MOPS on a matrix multiplication benchmark when configured in non-redundant mode and 617 and 414 MOPS in dual and triple mode, respectively. A software-based recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2, representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster. As a high-performance alternative, a new hardware-based method provides rapid fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely ~9.4% area overhead over the baseline non-redundant RISC-V cluster. The cluster is also enhanced with split-lock capabilities to enter one of the redundant modes with minimum performance loss, allowing execution of a mission-critical or a performance section, with <400 clock cycles overhead for entry and exit. The proposed system is the first to integrate these functionalities on an open-source RISC-V-based compute device, enabling finely tunable reliability vs. performance trade-offs

    Redundancy management for efficient fault recovery in NASA's distributed computing system

    Get PDF
    The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

    Robust configurable system design with built-in self-healing

    Get PDF
    The new generations of SRAM-based FPGA (Field Programmable Gate Array) devices, built on nanometre technology, are the preferred choice for the implementation of reconfigurable computing platforms. However, their vulnerability to hard and soft errors is a major weakness to robust system design based on FPGAs. In this paper, a novel Built-In Self-Healing (BISH) methodology, based on modular redundancy and on selfreconfiguration, is proposed. A soft microprocessor core implemented in the FPGA is responsible for the management and execution of all the BISH procedures. Fault detection and diagnosis is followed by repairing actions, taking advantage of the self-configuration features. Meanwhile, modular redundancy assures that the system still works correctly. This approach leads to a robust system design able to assure high reliability, availability and data integrity

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    Restoring Reliability in Fault Tolerant Reconfigurable Systems

    Get PDF
    The new generations of SRAM-based FPGAdevices, built on nanometer technology, are thepreferred choice for the implementation ofreconfigurable computing platforms. However,smaller technological scales increase theirvulnerability to manufacturing imperfections andhence to the occurrence of electromigration.Moreover, the large internal RAM (for configurationpurposes or as embedded memory blocks) makesthem more prone to soft errors.The incorporation of self-reconfigurationcapabilities in recent FPGAs, allied to the use of softand hard microprocessor cores, facilitates the offsetof these vulnerabilities by enabling the developmentof self-restoring fault tolerant reconfigurablesystems. In the methodology presented in this paper,the embedded microprocessor is also responsible forthe implementation of online self-test-and-repairstrategies, based on modular redundancy and onself-reconfiguration. The detection of faults, causedby soft or hard errors, may be followed by repairingactions, depending on the fault type. This approachleads to smoother system degradation, extending itslifetime and improving its reliability
    • …
    corecore