2,289 research outputs found

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults

    Get PDF
    Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without accompanying frequency scaling leads to timing related faults, potentially undermining the energy savings. Through experimental voltage underscaling studies on commercial FPGAs, we observed that the rate of these faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To mitigate these faults, we evaluated the efficiency of the built-in Error-Correction Code (ECC) and observed that more than 90% of the faults are correctable and further 7% are detectable (but not correctable). This efficiency is the result of the single-bit type of these faults, which are then effectively covered by the Single-Error Correction and Double-Error Detection (SECDED) design of the built-in ECC. Finally, motivated by the above experimental observations, we evaluated an FPGA-based Neural Network (NN) accelerator under low-voltage operations, while built-in ECC is leveraged to mitigate undervolting faults and thus, prevent NN significant accuracy loss. In consequence, we achieve 40% of the BRAM power saving through undervolting below the minimum safe voltage level, with a negligible NN accuracy loss, thanks to the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure

    Framework for Simulation of Heterogeneous MpSoC for Design Space Exploration

    Full text link
    Due to the ever-growing requirements in high performance data computation, multiprocessor systems have been proposed to solve the bottlenecks in uniprocessor systems. Developing efficient multiprocessor systems requires effective exploration of design choices like application scheduling, mapping, and architecture design. Also, fault tolerance in multiprocessors needs to be addressed. With the advent of nanometer-process technology for chip manufacturing, realization of multiprocessors on SoC (MpSoC) is an active field of research. Developing efficient low power, fault-tolerant task scheduling, and mapping techniques for MpSoCs require optimized algorithms that consider the various scenarios inherent in multiprocessor environments. Therefore there exists a need to develop a simulation framework to explore and evaluate new algorithms on multiprocessor systems. This work proposes a modular framework for the exploration and evaluation of various design algorithms for MpSoC system. This work also proposes new multiprocessor task scheduling and mapping algorithms for MpSoCs. These algorithms are evaluated using the developed simulation framework. The paper also proposes a dynamic fault-tolerant (FT) scheduling and mapping algorithm for robust application processing. The proposed algorithms consider optimizing the power as one of the design constraints. The framework for a heterogeneous multiprocessor simulation was developed using SystemC/C++ language. Various design variations were implemented and evaluated using standard task graphs. Performance evaluation metrics are evaluated and discussed for various design scenarios

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    SpiNNaker: Fault tolerance in a power- and area- constrained large-scale neuromimetic architecture

    Get PDF
    AbstractSpiNNaker is a biologically-inspired massively-parallel computer designed to model up to a billion spiking neurons in real-time. A full-fledged implementation of a SpiNNaker system will comprise more than 105 integrated circuits (half of which are SDRAMs and half multi-core systems-on-chip). Given this scale, it is unavoidable that some components fail and, in consequence, fault-tolerance is a foundation of the system design. Although the target application can tolerate a certain, low level of failures, important efforts have been devoted to incorporate different techniques for fault tolerance. This paper is devoted to discussing how hardware and software mechanisms collaborate to make SpiNNaker operate properly even in the very likely scenario of component failures and how it can tolerate system-degradation levels well above those expected

    Evaluation of fault-tolerant parallel-processor architectures over long space missions

    Get PDF
    The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration

    Neural network based architectures for aerospace applications

    Get PDF
    A brief history of the field of neural networks research is given and some simple concepts are described. In addition, some neural network based avionics research and development programs are reviewed. The need for the United States Air Force and NASA to assume a leadership role in supporting this technology is stressed
    • …
    corecore