2,960 research outputs found

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Real-time fault identification for developmental turbine engine testing

    Get PDF
    Hundreds of individual sensors produce an enormous amount of data during developmental turbine engine testing. The challenge is to ensure the validity of the data and to identify data and engine anomalies in a timely manner. An automated data validation, engine condition monitoring, and fault identification process that emulates typical engineering techniques has been developed for developmental engine testing.An automated data validation and fault identification approach employing enginecycle-matching principles is described. Engine cycle-matching is automated by using an adaptive nonlinear component-level computer model capable of simulating both steady state and transient engine operation. Automated steady-state, transient, and real-time model calibration processes are also described. The model enables automation of traditional data validation, engine condition monitoring, and fault identification procedures. A distributed parallel computing approach enables the entire process to operate in real-time.The result is a capability to detect data and engine anomalies in real-time during developmental engine testing. The approach is shown to be successful in detecting and identifying sensor anomalies as they occur and distinguishing these anomalies from variations in component and overall engine aerothermodynamic performance. The component-level model-based engine performance and fault identification technique of the present research is capable of: identifying measurement errors on the order of 0.5 percent (e.g., sensor bias, drift,level shift, noise, or poor response) in facility fuel flow, airflow, and thrust measurements; identifying measurement errors in engine aerothermodynamic measurements (rotorspeeds, gas path pressures and temperatures); identifying measurement errors in engine control sensors (e.g., leaking/biased pressure sensor, slowly responding pressure measurement) and variable geometry rigging (e.g., misset guide vanes or nozzle area) that would invalidate a test or series of tests; identifying abrupt faults (e.g., faults due to domestic object damage, foreign object damage, and control anomalies); identifying slow faults (e.g., component or overall engine degradation, and sensor drift). Specifically, the technique is capable of identifying small changes in compressor (or fan) performance on the order of 0.5 percent; and being easily extended to diagnose secondary failure modes and to verify any modeling assumptions that may arise for developmental engine tests (e.g., increase in turbine flow capacity, inaccurate measurement of facility bleed flows, horsepower extraction, etc.).The component-level model-based engine performance and fault identification method developed in the present work brings together features which individually and collectively advance the state-of-the-art. These features are separated into three categories: advancements to effectively quantify off-nominal behavior, advancements to provide a fault detection capability that is practical from the viewpoint of the analysis,implementation, tuning, and design, and advancements to provide a real-time fault detection capability that is reliable and efficient

    Development of life prediction capabilities for liquid propellant rocket engines. Post-fire diagnostic system for the SSME system architecture study

    Get PDF
    This system architecture task (1) analyzed the current process used to make an assessment of engine and component health after each test or flight firing of an SSME, (2) developed an approach and a specific set of objectives and requirements for automated diagnostics during post fire health assessment, and (3) listed and described the software applications required to implement this system. The diagnostic system described is a distributed system with a database management system to store diagnostic information and test data, a CAE package for visual data analysis and preparation of plots of hot-fire data, a set of procedural applications for routine anomaly detection, and an expert system for the advanced anomaly detection and evaluation

    Telemetry Fault-Detection Algorithms: Applications for Spacecraft Monitoring and Space Environment Sensing

    Get PDF
    Algorithms have been developed that identify unusual behavior in satellite health telemetry. Telemetry from solid-state power amplifiers and amplifier thermistors from 32 geostationary Earth orbit communications satellites from 1991 to 2015 are examined. Transient event detection and change-point event detection techniques that use a sliding window-based median are used, statistically evaluating the telemetry stream compared to the local norm. This approach allows application of the algorithms to any spacecraft platform because there is no reliance in the algorithms on satellite- or component-specific parameters, and it does not require a priori knowledge about the data distribution. Individual telemetry data streams are analyzed with the event detection algorithms, resulting in a compiled list of unusual events for each satellite. This approach identifies up to six events of up to six events that affect 51 of 53 telemetry streams at once, indicative of a spacecraft system-level event. In two satellites, the same top event date (4 December 2008) occurs over more than 10 years of telemetry from both satellites. Of the five spacecraft with known maneuvers, the algorithms identify the maneuvers in all cases. Event dates are compared to known operational activities, space weather events, and available anomaly lists to assess the use of event detection algorithms for spacecraft monitoring and sensing of the space environment.The authors would like to acknowledge the U.S. Air Force Office of Sponsored Research grant FA9550-13-1-0099 and NASA for funding this work through NASA Space Technology and Research Fellowship grant NNX16AM74H

    Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

    Get PDF
    The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) effects often cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focuses on a novel and different approach: using machine learning algorithms on consumer electronic level Field Programmable Gate Arrays (FPGAs) to tackle TID effects and monitor them to replace before they stop working. This condition has a research challenge to anticipate when the board results in a total failure due to TID effects. We observed internal measurements of the FPGA boards under gamma radiation and used three different anomaly detection machine learning (ML) algorithms to detect anomalies in the sensor measurements in a gamma-radiated environment. The statistical results show a highly significant relationship between the gamma radiation exposure levels and the board measurements. Moreover, our anomaly detection results have shown that a One-Class Support Vector Machine with Radial Basis Function Kernel has an average Recall score of 0.95. Also, all anomalies can be detected before the boards stop working
    • …
    corecore