Search CORE

85,582 research outputs found

Implementing atomic actions in Ada 95

Author: Burns A
Wellings A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

Atomic actions are an important dynamic structuring technique that aid the construction of fault-tolerant concurrent systems. Although they were developed some years ago, none of the well-known commercially-available programming languages directly support their use. This paper summarizes software fault tolerance techniques for concurrent systems, evaluates the Ada 95 programming language from the perspective of its support for software fault tolerance, and shows how Ada 95 can be used to implement software fault tolerance techniques. In particular, it shows how packages, protected objects, requeue, exceptions, asynchronous transfer of control, tagged types, and controlled types can be used as building blocks from which to construct atomic actions with forward and backward error recovery, which are resilient to deserter tasks and task abortion

Crossref

White Rose Research Online

Fault tolerant software modules for SIFT

Author: Hecht H.
Hecht M.
Publication venue
Publication date
Field of study

The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included

NASA Technical Reports Server

Data criticality estimation in software applications

Author: Benso Alfredo
Di Carlo Stefano
Di Natale Giorgio
Prinetto Paolo Ernesto
Tagliaferri Luca
Publication venue: IEEE
Publication date: 01/01/2003
Field of study

In safety-critical applications it is often possible to exploit software techniques to increase system's fault- tolerance. Common approaches are based on data redundancy to prevent data corruption during the software execution. Duplicating most critical variables only can significantly reduce the memory and performance overheads, while still guaranteeing very good results in terms of fault-tolerance improvement. This paper presents a new methodology to compute the criticality of variables in target software applications. Instead of resorting to time consuming fault injection experiments, the proposed solution is based on the run- time analysis of the variables' behavior logged during the execution of the target application under different workloads

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

The use of back-to-back, or comparison, testing for regression test or porting is examined. The efficiency and the cost of the strategy is compared with manual and table-driven single version testing. Some of the key parameters that influence the efficiency and the cost of the approach are the failure identification effort during single version program testing, the extent of implemented changes, the nature of the regression test data (e.g., random), and the nature of the inter-version failure correlation and fault-masking. The advantages and disadvantages of the technique are discussed, together with some suggestions concerning its practical use

NASA Technical Reports Server

Software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

Twenty independently developed but functionally equivalent software versions were used to investigate and compare empirically some properties of N-version programming, Recovery Block, and Consensus Recovery Block, using the majority and consensus voting algorithms. This was also compared with another hybrid fault-tolerant scheme called Acceptance Voting, using dynamic versions of consensus and majority voting. Consensus voting provides adaptation of the voting strategy to varying component reliability, failure correlation, and output space characteristics. Since failure correlation among versions effectively reduces the cardinality of the space in which the voter make decisions, consensus voting is usually preferable to simple majority voting in any fault-tolerant system. When versions have considerably different reliabilities, the version with the best reliability will perform better than any of the fault-tolerant techniques

NASA Technical Reports Server

A fault-tolerant intelligent robotic control system

Author: Marzwell Neville I.
Tso Kam Sing
Publication venue
Publication date
Field of study

This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines

NASA Technical Reports Server

Measuring the BDARX architecture by agent oriented system a case study

Author: Abdul Hamid Noorhamreeza
Khan Sundas Naqeeb
Mohd Nawi Nazri
Rahman Atta-ur-
Shahzad Asim
Ullah Arif
Witjaksono R Wahjoe
Publication venue: 'Penerbit UTHM'
Publication date: 01/01/2018
Field of study

Distributed systems are progressively designed as multi-agent systems that are helpful in designing high strength complex industrial software. Recently, distributed systems cooperative applications are openly access, dynamic and large scales. Nowadays, it hardly seems necessary to emphasis on the potential of decentralized software solutions. This is because the main benefit lies in the distributed nature of information, resources and action. On the other hand, the progression in multi agent systems creates new challenges to the traditional methodologies of fault-tolerance that typically relies on centralized and offline solution. Research on multi-agent systems had gained attention for designing software that operates in distributed and open environments, such as the Internet. DARX (Dynamic Agent Replication eXtension) is one of the architecture which aimed at building reliable software that would prove to be both flexible and scalable and also aimed to provide adaptive fault tolerance by using dynamic replication methodologies. Therefore, the enhancement of DARX known as BDARX can provide dynamic solution of byzantine faults for the agent based systems that embedded DARX. The BDARX architecture improves the fault tolerance ability of multi-agent systems in long run and strengthens the software to be more robust against such arbitrary faults. The BDARX provide the solution for the Byzantine fault tolerance in DARX by making replicas on the both sides of communication agents by using BFT protocol for agent systems instead of making replicas only on server end and assuming client as failure free. This paper shows that the dynamic behaviour of agents avoid us from making discrimination between server and client replicas

UTHM Institutional Repository

Study of fault tolerant software technology for dynamic systems

Author: Caglayan A. K.
Zacharias G. L.
Publication venue
Publication date
Field of study

The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented

NASA Technical Reports Server