49,276 research outputs found

    Study of fault-tolerant software technology

    Get PDF
    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance

    Intelligent fault management for the Space Station active thermal control system

    Get PDF
    The Thermal Advanced Automation Project (TAAP) approach and architecture is described for automating the Space Station Freedom (SSF) Active Thermal Control System (ATCS). The baseline functionally and advanced automation techniques for Fault Detection, Isolation, and Recovery (FDIR) will be compared and contrasted. Advanced automation techniques such as rule-based systems and model-based reasoning should be utilized to efficiently control, monitor, and diagnose this extremely complex physical system. TAAP is developing advanced FDIR software for use on the SSF thermal control system. The goal of TAAP is to join Knowledge-Based System (KBS) technology, using a combination of rules and model-based reasoning, with conventional monitoring and control software in order to maximize autonomy of the ATCS. TAAP's predecessor was NASA's Thermal Expert System (TEXSYS) project which was the first large real-time expert system to use both extensive rules and model-based reasoning to control and perform FDIR on a large, complex physical system. TEXSYS showed that a method is needed for safely and inexpensively testing all possible faults of the ATCS, particularly those potentially damaging to the hardware, in order to develop a fully capable FDIR system. TAAP therefore includes the development of a high-fidelity simulation of the thermal control system. The simulation provides realistic, dynamic ATCS behavior and fault insertion capability for software testing without hardware related risks or expense. In addition, thermal engineers will gain greater confidence in the KBS FDIR software than was possible prior to this kind of simulation testing. The TAAP KBS will initially be a ground-based extension of the baseline ATCS monitoring and control software and could be migrated on-board as additional computation resources are made available

    Distributed systems status and control

    Get PDF
    Concepts are investigated for an automated status and control system for a distributed processing environment. System characteristics, data requirements for health assessment, data acquisition methods, system diagnosis methods and control methods were investigated in an attempt to determine the high-level requirements for a system which can be used to assess the health of a distributed processing system and implement control procedures to maintain an accepted level of health for the system. A potential concept for automated status and control includes the use of expert system techniques to assess the health of the system, detect and diagnose faults, and initiate or recommend actions to correct the faults. Therefore, this research included the investigation of methods by which expert systems were developed for real-time environments and distributed systems. The focus is on the features required by real-time expert systems and the tools available to develop real-time expert systems

    Seal integrity and the impact on food waste

    Get PDF
    An investigation into the contribution that inadequate heat sealing of food packaging might make to the generation of food waste, in the supply chain and the househol

    A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

    Full text link
    Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

    Correct and Control Complex IoT Systems: Evaluation of a Classification for System Anomalies

    Full text link
    In practice there are deficiencies in precise interteam communications about system anomalies to perform troubleshooting and postmortem analysis along different teams operating complex IoT systems. We evaluate the quality in use of an adaptation of IEEE Std. 1044-2009 with the objective to differentiate the handling of fault detection and fault reaction from handling of defect and its options for defect correction. We extended the scope of IEEE Std. 1044-2009 from anomalies related to software only to anomalies related to complex IoT systems. To evaluate the quality in use of our classification a study was conducted at Robert Bosch GmbH. We applied our adaptation to a postmortem analysis of an IoT solution and evaluated the quality in use by conducting interviews with three stakeholders. Our adaptation was effectively applied and interteam communications as well as iterative and inductive learning for product improvement were enhanced. Further training and practice are required.Comment: Submitted to QRS 2020 (IEEE Conference on Software Quality, Reliability and Security

    Instrumenting self-modifying code

    Full text link
    Adding small code snippets at key points to existing code fragments is called instrumentation. It is an established technique to debug certain otherwise hard to solve faults, such as memory management issues and data races. Dynamic instrumentation can already be used to analyse code which is loaded or even generated at run time.With the advent of environments such as the Java Virtual Machine with optimizing Just-In-Time compilers, a new obstacle arises: self-modifying code. In order to instrument this kind of code correctly, one must be able to detect modifications and adapt the instrumentation code accordingly, preferably without incurring a high penalty speedwise. In this paper we propose an innovative technique that uses the hardware page protection mechanism of modern processors to detect such modifications. We also show how an instrumentor can adapt the instrumented version depending on the kind of modificiations as well as an experimental evaluation of said techniques.Comment: In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth International Workshop on Automated Debugging (AADEBUG 2003), September 2003, Ghent. cs.SE/030902
    corecore