Search CORE

1,675 research outputs found

Using certification trails to achieve software fault tolerance

Author: Masson Gerald M.
Sullivan Gregory F.
Publication venue
Publication date
Field of study

A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper

NASA Technical Reports Server

Certification trails for data structures

Author: Masson Gerald M.
Sullivan Gregory F.
Publication venue
Publication date
Field of study

Certification trails are a recently introduced and promising approach to fault detection and fault tolerance. The applicability of the certification trail technique is significantly generalized. Previously, certification trails had to be customized to each algorithm application; trails appropriate to wide classes of algorithms were developed. These certification trails are based on common data-structure operations such as those carried out using these sets of operations such as those carried out using balanced binary trees and heaps. Any algorithms using these sets of operations can therefore employ the certification trail method to achieve software fault tolerance. To exemplify the scope of the generalization of the certification trail technique provided, constructions of trails for abstract data types such as priority queues and union-find structures are given. These trails are applicable to any data-structure implementation of the abstract data type. It is also shown that these ideals lead naturally to monitors for data-structure operations

NASA Technical Reports Server

Transient Faults in Computer Systems

Author: Masson Gerald M.
Publication venue
Publication date
Field of study

A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed

NASA Technical Reports Server

Experimental evaluation of certification trails using abstract data type validation

Author: Masson Gerald M.
Sullivan Gregory F.
Wilson Dwight S.
Publication venue
Publication date
Field of study

Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. Recent experimental work reveals many cases in which a certification-trail approach allows for significantly faster program execution time than a basic time-redundancy approach. Algorithms for answer-validation of abstract data types allow a certification trail approach to be used for a wide variety of problems. An attempt to assess the performance of algorithms utilizing certification trails on abstract data types is reported. Specifically, this method was applied to the following problems: heapsort, Hullman tree, shortest path, and skyline. Previous results used certification trails specific to a particular problem and implementation. The approach allows certification trails to be localized to 'data structure modules,' making the use of this technique transparent to the user of such modules

NASA Technical Reports Server

Certification of computational results

Author: Masson Gerald M.
Sullivan Gregory F.
Wilson Dwight S.
Publication venue
Publication date
Field of study

A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance

NASA Technical Reports Server

Recommended from our members

Fault Tolerance Against Design Faults

Author: Strigini L.
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

City Research Online

Method and apparatus for fault tolerance

Author: Masson Gerald M.
Sullivan Gregory F.
Publication venue
Publication date: 07/09/1993
Field of study

A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period

NASA Technical Reports Server

Certification trails and software design for testability

Author: Masson Gerald M.
Sullivan Gregory F.
Wilson Dwight S.
Publication venue
Publication date
Field of study

Design techniques which may be applied to make program testing easier were investigated. Methods for modifying a program to generate additional data which we refer to as a certification trail are presented. This additional data is designed to allow the program output to be checked more quickly and effectively. Certification trails were described primarily from a theoretical perspective. A comprehensive attempt to assess experimentally the performance and overall value of the certification trail method is reported. The method was applied to nine fundamental, well-known algorithms for the following problems: convex hull, sorting, huffman tree, shortest path, closest pair, line segment intersection, longest increasing subsequence, skyline, and voronoi diagram. Run-time performance data for each of these problems is given, and selected problems are described in more detail. Our results indicate that there are many cases in which certification trails allow for significantly faster overall program execution time than a 2-version programming approach, and also give further evidence of the breadth of applicability of this method

NASA Technical Reports Server

Pipelined Algorithms to Detect Cheating in Long-Term Grid Computations

Author: Becker
Blum
Blum
Blum
Cachin
Devillers
Du
Golle
Golle
Kahney
Kannan
Lipton
Michael T. Goodrich
Sullivan
Publication venue: 'Elsevier BV'
Publication date: 28/11/2008
Field of study

This paper studies pipelined algorithms for protecting distributed grid computations from cheating participants, who wish to be rewarded for tasks they receive but don't perform. We present improved cheater detection algorithms that utilize natural delays that exist in long-term grid computations. In particular, we partition the sequence of grid tasks into two interleaved sequences of task rounds, and we show how to use those rounds to devise the first general-purpose scheme that can catch all cheaters, even when cheaters collude. The main idea of this algorithm might at first seem counter-intuitive--we have the participants check each other's work. A naive implementation of this approach would, of course, be susceptible to collusion attacks, but we show that by, adapting efficient solutions to the parallel processor diagnosis problem, we can tolerate collusions of lazy cheaters, even if the number of such cheaters is a fraction of the total number of participants. We also include a simple economic analysis of cheaters in grid computations and a parameterization of the main deterrent that can be used against them--the probability of being caught.Comment: Expanded version with an additional figure; ISSN 0304-397

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

The SAVE System: Secure Architecture for Voting Electronically: Existing Technology, with Built-in Redundancy, Enables Reliability

Author: Goler Jonathan
Selker Ted
Publication venue: Caltech/MIT Voting Technology Project
Publication date: 04/01/2004
Field of study

Existing technology is capable of yielding secure, reliable, and auditable voting systems. This system outlines an architecture based on redundancy at each stage of the ballot submission process that is resistant to external hacking and internal insertion of malicious code. The proposed architecture addresses all layers of the system beyond the point when a voter commits the ballot. These steps include the verification of eligibility to vote, authentication, and aggregation of the vote. A redundant electronic audit trail keeps track of all of the votes and messages received, rendering a physical paper trail unnecessary. There is no single point of failure in the system, as none of the components at a particular layer relies on any of the others; nor is there a single component that decides what tally is correct. Each system arrives at the result on its own. Programming time for implementation is minimal. The proposed architecture was written in Java in a short time. A second programmer was able to write a module in less than a week. Performance and reliability are incrementally improvable by separate programmers writing new redundant modules

DSpace@MIT