Search CORE

14 research outputs found

Software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

Twenty independently developed but functionally equivalent software versions were used to investigate and compare empirically some properties of N-version programming, Recovery Block, and Consensus Recovery Block, using the majority and consensus voting algorithms. This was also compared with another hybrid fault-tolerant scheme called Acceptance Voting, using dynamic versions of consensus and majority voting. Consensus voting provides adaptation of the voting strategy to varying component reliability, failure correlation, and output space characteristics. Since failure correlation among versions effectively reduces the cardinality of the space in which the voter make decisions, consensus voting is usually preferable to simple majority voting in any fault-tolerant system. When versions have considerably different reliabilities, the version with the best reliability will perform better than any of the fault-tolerant techniques

NASA Technical Reports Server

CSP methods for identifying atomic actions in the design of fault tolerant concurrent systems

Author: Carpenter G.F.
Tyrrell A.M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/1995
Field of study

Limiting the extent of error propagation when faults occur and localizing the subsequent error recovery are common concerns in the design of fault tolerant parallel processing systems, Both activities are made easier if the designer associates fault tolerance mechanisms with the underlying atomic actions of the system, With this in mind, this paper has investigated two methods for the identification of atomic actions in parallel processing systems described using CSP, Explicit trace evaluation forms the basis of the first algorithm, which enables a designer to analyze interprocess communications and thereby locate atomic action boundaries in a hierarchical fashion, The second method takes CSP descriptions of the parallel processes and uses structural arguments to infer the atomic action boundaries. This method avoids the difficulties involved with producing full trace sets, but does incur the penalty of a more complex algorithm

White Rose Research Online

An experimental evaluation of software redundancy as a strategy for improving reliability

Author: Caglayan Alper K.
Eckhardt Dave E., Jr.
Kelly John P. J.
Knight John C.
Lee Larry D.
Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

The strategy of using multiple versions of independently developed software as a means to tolerate residual software design faults is suggested by the success of hardware redundancy for tolerating hardware failures. Although, as generally accepted, the independence of hardware failures resulting from physical wearout can lead to substantial increases in reliability for redundant hardware structures, a similar conclusion is not immediate for software. The degree to which design faults are manifested as independent failures determines the effectiveness of redundancy as a method for improving software reliability. Interest in multi-version software centers on whether it provides an adequate measure of increased reliability to warrant its use in critical applications. The effectiveness of multi-version software is studied by comparing estimates of the failure probabilities of these systems with the failure probabilities of single versions. The estimates are obtained under a model of dependent failures and compared with estimates obtained when failures are assumed to be independent. The experimental results are based on twenty versions of an aerospace application developed and certified by sixty programmers from four universities. Descriptions of the application, development and certification processes, and operational evaluation are given together with an analysis of the twenty versions

NASA Technical Reports Server

Based on MIPv6 with Support to Improve the Mobile Commerce Transaction

Author: Hang Zhong-Hong
Hong Chia-Hung
Hung Yen-Chu
Tsai Chia-Wei
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2005
Field of study

Mobile Commerce is anticipated to be the next business revolution. Under the trend of mobile age, a person begins to realize the benefits of transaction by mobility operations. We can access information, shop and bank on line, work from home and speak and send messages via mobile appliances throughout all over the world. The research that is mobile transaction managing on database has begun since 1950 and skips the Link and Network Layer with support to improve mobile commerce. This paper focus on how effectually to make the new generation of mobile network protocol apply on mobile commerce and improve the mainly four properties required by mobile transactions. The four properties are respectively atomicity, consistency, isolation and durability. The purpose based on the mobile commerce environment and making mobile transactions complete and personal by means of the Destination Extension Header based on IPv6 and the Java Transaction Service. After experiment and testing, this paper verify that we improve the mobile commerce environment and make the mobile transaction more complete with the optimization of the Destination Extension Header based on IPv6 and the Java Transaction Service under the comparison with the environment on IPv4

AIS Electronic Library (AISeL)

Multiversion software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

In this project we have proposed to investigate a number of experimental and theoretical issues associated with the practical use of multi-version software in providing dependable software through fault-avoidance and fault-elimination, as well as run-time tolerance of software faults. In the period reported here we have working on the following: We have continued collection of data on the relationships between software faults and reliability, and the coverage provided by the testing process as measured by different metrics (including data flow metrics). We continued work on software reliability estimation methods based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. We have continued studying back-to-back testing as an efficient mechanism for removal of uncorrelated faults, and common-cause faults of variable span. We have also been studying back-to-back testing as a tool for improvement of the software change process, including regression testing. We continued investigating existing, and worked on formulation of new fault-tolerance models. In particular, we have partly finished evaluation of Consensus Voting in the presence of correlated failures, and are in the process of finishing evaluation of Consensus Recovery Block (CRB) under failure correlation. We find both approaches far superior to commonly employed fixed agreement number voting (usually majority voting). We have also finished a cost analysis of the CRB approach

NASA Technical Reports Server

Experiments in fault tolerant software reliability

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

Twenty functionally equivalent programs were built and tested in a multiversion software experiment. Following unit testing, all programs were subjected to an extensive system test. In the process sixty-one distinct faults were identified among the versions. Less than 12 percent of the faults exhibited varying degrees of positive correlation. The common-cause (or similar) faults spanned as many as 14 components. However, a majority of these faults were trivial, and easily detected by proper unit and/or system testing. Only two of the seven similar faults were difficult faults, and both were caused by specification ambiguities. One of these faults exhibited variable identical-and-wrong response span, i.e. response span which varied with the testing conditions and input data. Techniques that could have been used to avoid the faults are discussed. For example, it was determined that back-to-back testing of 2-tuples could have been used to eliminate about 90 percent of the faults. In addition, four of the seven similar faults could have been detected by using back-to-back testing of 5-tuples. It is believed that most, if not all, similar faults could have been avoided had the specifications been written using more formal notation, the unit testing phase was subject to more stringent standards and controls, and better tools for measuring the quality and adequacy of the test data (e.g. coverage) were used

NASA Technical Reports Server

Experiments in fault tolerant software reliability

Author: Mcallister David F.
Tai K. C.
Vouk Mladen A.
Publication venue
Publication date
Field of study

The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated

NASA Technical Reports Server

Software fault tolerance in computer operating systems

Author: Iyer Ravishankar K.
Lee Inhwan
Publication venue
Publication date
Field of study

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved

NASA Technical Reports Server

Measurement and Analysis of Operating System Fault Tolerance

Author: Iyer Ravishankar K.
Lee Inhwan
Tang Dong
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/10/1992
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryONR / N00014-91-J-1116NASA / NAG-1-61

Illinois Digital Environment for Access to Learning and Scholarship Repository

NASA Technical Reports Server

Software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down

NASA Technical Reports Server