41,937 research outputs found

    Cross-layer system reliability assessment framework for hardware faults

    Get PDF
    System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

    Fault detection, identification and accommodation techniques for unmanned airborne vehicles

    Get PDF
    Unmanned Airborne Vehicles (UAV) are assuming prominent roles in both the commercial and military aerospace industries. The promise of reduced costs and reduced risk to human life is one of their major attractions, however these low-cost systems are yet to gain acceptance as a safe alternate to manned solutions. The absence of a thinking, observing, reacting and decision making pilot reduces the UAVs capability of managing adverse situations such as faults and failures. This paper presents a review of techniques that can be used to track the system health onboard a UAV. The review is based on a year long literature review aimed at identifying approaches suitable for combating the low reliability and high attrition rates of today’s UAV. This research primarily focuses on real-time, onboard implementations for generating accurate estimations of aircraft health for fault accommodation and mission management (change of mission objectives due to deterioration in aircraft health). The major task of such systems is the process of detection, identification and accommodation of faults and failures (FDIA). A number of approaches exist, of which model-based techniques show particular promise. Model-based approaches use analytical redundancy to generate residuals for the aircraft parameters that can be used to indicate the occurrence of a fault or failure. Actions such as switching between redundant components or modifying control laws can then be taken to accommodate the fault. The paper further describes recent work in evaluating neural-network approaches to sensor failure detection and identification (SFDI). The results of simulations with a variety of sensor failures, based on a Matlab non-linear aircraft model are presented and discussed. Suggestions for improvements are made based on the limitations of this neural network approach with the aim of including a broader range of failures, while still maintaining an accurate model in the presence of these failures

    Design considerations for flight test of a fault inferring nonlinear detection system algorithm for avionics sensors

    Get PDF
    The modifications to the design of a fault inferring nonlinear detection system (FINDS) algorithm to accommodate flight computer constraints and the resulting impact on the algorithm performance are summarized. An overview of the flight data-driven FINDS algorithm is presented. This is followed by a brief analysis of the effects of modifications to the algorithm on program size and execution speed. Significant improvements in estimation performance for the aircraft states and normal operating sensor biases, which have resulted from improved noise design parameters and a new steady-state wind model, are documented. The aircraft state and sensor bias estimation performances of the algorithm's extended Kalman filter are presented as a function of update frequency of the piecewise constant filter gains. The results of a new detection system strategy and failure detection performance, as a function of gain update frequency, are also presented

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    A runtime heuristic to selectively replicate tasks for application-specific reliability targets

    Get PDF
    In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.This work was supported by FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and in part by the European Union (FEDER funds) under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

    A preliminary design for flight testing the FINDS algorithm

    Get PDF
    This report presents a preliminary design for flight testing the FINDS (Fault Inferring Nonlinear Detection System) algorithm on a target flight computer. The FINDS software was ported onto the target flight computer by reducing the code size by 65%. Several modifications were made to the computational algorithms resulting in a near real-time execution speed. Finally, a new failure detection strategy was developed resulting in a significant improvement in the detection time performance. In particular, low level MLS, IMU and IAS sensor failures are detected instantaneously with the new detection strategy, while accelerometer and the rate gyro failures are detected within the minimum time allowed by the information generated in the sensor residuals based on the point mass equations of motion. All of the results have been demonstrated by using five minutes of sensor flight data for the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment

    Evaluation of reliability modeling tools for advanced fault tolerant systems

    Get PDF
    The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport

    Sensor failure detection system

    Get PDF
    Advanced concepts for detecting, isolating, and accommodating sensor failures were studied to determine their applicability to the gas turbine control problem. Five concepts were formulated based upon such techniques as Kalman filters and a screening process led to the selection of one advanced concept for further evaluation. The selected advanced concept uses a Kalman filter to generate residuals, a weighted sum square residuals technique to detect soft failures, likelihood ratio testing of a bank of Kalman filters for isolation, and reconfiguring of the normal mode Kalman filter by eliminating the failed input to accommodate the failure. The advanced concept was compared to a baseline parameter synthesis technique. The advanced concept was shown to be a viable concept for detecting, isolating, and accommodating sensor failures for the gas turbine applications

    Validation of a fault-tolerant clock synchronization system

    Get PDF
    A validation method for the synchronization subsystem of a fault tolerant computer system is investigated. The method combines formal design verification with experimental testing. The design proof reduces the correctness of the clock synchronization system to the correctness of a set of axioms which are experimentally validated. Since the reliability requirements are often extreme, requiring the estimation of extremely large quantiles, an asymptotic approach to estimation in the tail of a distribution is employed

    Downscaling of fracture energy during brittle creep experiments

    Get PDF
    We present mode 1 brittle creep fracture experiments along fracture surfaces that contain strength heterogeneities. Our observations provide a link between smooth macroscopic time-dependent failure and intermittent microscopic stress-dependent processes. We find the large-scale response of slow-propagating subcritical cracks to be well described by an Arrhenius law that relates the fracture speed to the energy release rate. At the microscopic scale, high-resolution optical imaging of the transparent material used (PMMA) allows detailed description of the fracture front. This reveals a local competition between subcritical and critical propagation (pseudo stick-slip front advances) independently of loading rates. Moreover, we show that the local geometry of the crack front is self-affine and the local crack front velocity is power law distributed. We estimate the local fracture energy distribution by combining high-resolution measurements of the crack front geometry and an elastic line fracture model. We show that the average local fracture energy is significantly larger than the value derived from a macroscopic energy balance. This suggests that homogenization of the fracture energy is not straightforward and should be taken cautiously. Finally, we discuss the implications of our results in the context of fault mechanics
    • 

    corecore