1,885 research outputs found
Periodic Application of Concurrent Error Detection in Processor Array Architectures
Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions
Confidence in Processor Array Outputs Under Periodic Application of Concurrant Error Detection
Coordinated Science Laboratory was formerly known as Control Systems LaboratorySDIO/IST and Office of Naval Research / N00014-89-K-0070National Aeronautics and Space Administration (NASA) / NAG 1-613Department of the Navy and Office of the Chief of Naval Research / N00014-91-J-128
Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack
Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria
Advanced flight control system study
The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview
Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness
Recommended from our members
Concurrent error detection in 2-D separable linear transform
As process technology continues to scale to smaller geometries and reduces the supply voltage, reliability of the resulting semiconductor becomes a greater concern. The effect of deep submicron noise, soft errors, variation, and aging degradation pose challenges on the functional correctness of VLSI systems and places roadblocks on reductions in scale. On the other side, as computing moves toward mobile, the energy efficiency of digital systems becomes one of the most important design metrics. However, reliability and energy efficiency are contradicting design requirements. Adding a voltage guard band is the most common method to mitigate the reliability impacts in such instances. Low power design technique like voltage over-scaling (VOS) even reduces the power by scaling the supply voltage just before data-dependant timing errors start to appear. Concurrent error detection is the solution to tackle reliability and energy-efficiency in a unified manner. Fault tolerance can be deployed at different design hierarchies. Given its low overhead, algorithm level error detection is an attractive approach. In this work, a generic weighted checksum code based error detection algorithm targeted generic 2-D separable linear transform is proposed. This technique encodes the input array at the 2-D linear trans- formation level, and algorithms are designed to operate on encoded data and produce encoded output data. The proposed error detection technique is a system-level method and therefore can be used in existing hardware or software 2-D linear transformation architectures with low overhead. The mathematic proof of the algorithm is provided within the scope of this dissertation. The checksum weighting vector for several common transforms are derived as examples, error detection cost and algorithm effectiveness are analyzed. In traditional fault tolerance study, the error is often evaluated at the boolean level. Many DSP applications, like 2-D linear transformation used in the multimedia compression system, do not require exactly correct results, but rather that the quality of the output is within the acceptable range. A generic quality aware error detection in the 2-D separable linear transform is proposed by extending the above property and defining the errors at the functional level. As an example, the quality-aware error detection technique is deployed on a low-power wavelet lifting transform architecture in JPEG2000. A low-cost Signal to Noise Ratio (SNR) aware detection logic based on proposed scheme is integrated into the discrete wavelet lifting transform architecture. This detection logic checks whether the image quality degradation caused by voltage over-scaling induced timing errors is acceptable and determines the optimal voltage set point in operating conditions at run time. This novel quality-based error detection approach is significantly different from traditional error detection schemes which look for exact data equivalence. A simulation result for one design shows that the supply voltage can be scaled down to 75% of the nominal voltage in typical process corner without significant image quality degradation, which translates to 9.15mW power consumption (44% power saving).Electrical and Computer Engineerin
Fault-Tolerant Computing: An Overview
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127
- …