49,276 research outputs found
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
Recommended from our members
Software safety : a definition and some preliminary thoughts
Software safety is the subject of a research project in its initial stages at the University of California Irvine. This research deals with critical real-time software where the cost of an error is high, e.g. human life. In this paper software techniques having a bearing on safety are described and evaluated. Initial definitions of software safety concepts are presented along with some preliminary thoughts and research questions
Intelligent fault management for the Space Station active thermal control system
The Thermal Advanced Automation Project (TAAP) approach and architecture is described for automating the Space Station Freedom (SSF) Active Thermal Control System (ATCS). The baseline functionally and advanced automation techniques for Fault Detection, Isolation, and Recovery (FDIR) will be compared and contrasted. Advanced automation techniques such as rule-based systems and model-based reasoning should be utilized to efficiently control, monitor, and diagnose this extremely complex physical system. TAAP is developing advanced FDIR software for use on the SSF thermal control system. The goal of TAAP is to join Knowledge-Based System (KBS) technology, using a combination of rules and model-based reasoning, with conventional monitoring and control software in order to maximize autonomy of the ATCS. TAAP's predecessor was NASA's Thermal Expert System (TEXSYS) project which was the first large real-time expert system to use both extensive rules and model-based reasoning to control and perform FDIR on a large, complex physical system. TEXSYS showed that a method is needed for safely and inexpensively testing all possible faults of the ATCS, particularly those potentially damaging to the hardware, in order to develop a fully capable FDIR system. TAAP therefore includes the development of a high-fidelity simulation of the thermal control system. The simulation provides realistic, dynamic ATCS behavior and fault insertion capability for software testing without hardware related risks or expense. In addition, thermal engineers will gain greater confidence in the KBS FDIR software than was possible prior to this kind of simulation testing. The TAAP KBS will initially be a ground-based extension of the baseline ATCS monitoring and control software and could be migrated on-board as additional computation resources are made available
Distributed systems status and control
Concepts are investigated for an automated status and control system for a distributed processing environment. System characteristics, data requirements for health assessment, data acquisition methods, system diagnosis methods and control methods were investigated in an attempt to determine the high-level requirements for a system which can be used to assess the health of a distributed processing system and implement control procedures to maintain an accepted level of health for the system. A potential concept for automated status and control includes the use of expert system techniques to assess the health of the system, detect and diagnose faults, and initiate or recommend actions to correct the faults. Therefore, this research included the investigation of methods by which expert systems were developed for real-time environments and distributed systems. The focus is on the features required by real-time expert systems and the tools available to develop real-time expert systems
Seal integrity and the impact on food waste
An investigation into the contribution that inadequate heat sealing of food packaging might make to the generation of food waste, in the supply chain and the househol
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
Supercomputing systems today often come in the form of large numbers of
commodity systems linked together into a computing cluster. These systems, like
any distributed system, can have large numbers of independent hardware
components cooperating or collaborating on a computation. Unfortunately, any of
this vast number of components can fail at any time, resulting in potentially
erroneous output. In order to improve the robustness of supercomputing
applications in the presence of failures, many techniques have been developed
to provide resilience to these kinds of system faults. This survey provides an
overview of these various fault-tolerance techniques.Comment: 11 page
Correct and Control Complex IoT Systems: Evaluation of a Classification for System Anomalies
In practice there are deficiencies in precise interteam communications about
system anomalies to perform troubleshooting and postmortem analysis along
different teams operating complex IoT systems. We evaluate the quality in use
of an adaptation of IEEE Std. 1044-2009 with the objective to differentiate the
handling of fault detection and fault reaction from handling of defect and its
options for defect correction. We extended the scope of IEEE Std. 1044-2009
from anomalies related to software only to anomalies related to complex IoT
systems. To evaluate the quality in use of our classification a study was
conducted at Robert Bosch GmbH. We applied our adaptation to a postmortem
analysis of an IoT solution and evaluated the quality in use by conducting
interviews with three stakeholders. Our adaptation was effectively applied and
interteam communications as well as iterative and inductive learning for
product improvement were enhanced. Further training and practice are required.Comment: Submitted to QRS 2020 (IEEE Conference on Software Quality,
Reliability and Security
Instrumenting self-modifying code
Adding small code snippets at key points to existing code fragments is called
instrumentation. It is an established technique to debug certain otherwise hard
to solve faults, such as memory management issues and data races. Dynamic
instrumentation can already be used to analyse code which is loaded or even
generated at run time.With the advent of environments such as the Java Virtual
Machine with optimizing Just-In-Time compilers, a new obstacle arises:
self-modifying code. In order to instrument this kind of code correctly, one
must be able to detect modifications and adapt the instrumentation code
accordingly, preferably without incurring a high penalty speedwise. In this
paper we propose an innovative technique that uses the hardware page protection
mechanism of modern processors to detect such modifications. We also show how
an instrumentor can adapt the instrumented version depending on the kind of
modificiations as well as an experimental evaluation of said techniques.Comment: In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth
International Workshop on Automated Debugging (AADEBUG 2003), September 2003,
Ghent. cs.SE/030902
- …