553 research outputs found

    Efficient diagnosis of multiprocessor systems under probabilistic models

    Get PDF
    The problem of fault diagnosis in multiprocessor systems is considered under a probabilistic fault model. The focus is on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. Lower and upper bounds on the number of tests required for regular systems are also presented. A class of regular systems which includes hypercubes is shown to be correctly diagnosable with high probability. In all cases, the number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis techniques

    Method and apparatus for fault tolerance

    Get PDF
    A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period

    Certification trails for data structures

    Get PDF
    Certification trails are a recently introduced and promising approach to fault detection and fault tolerance. The applicability of the certification trail technique is significantly generalized. Previously, certification trails had to be customized to each algorithm application; trails appropriate to wide classes of algorithms were developed. These certification trails are based on common data-structure operations such as those carried out using these sets of operations such as those carried out using balanced binary trees and heaps. Any algorithms using these sets of operations can therefore employ the certification trail method to achieve software fault tolerance. To exemplify the scope of the generalization of the certification trail technique provided, constructions of trails for abstract data types such as priority queues and union-find structures are given. These trails are applicable to any data-structure implementation of the abstract data type. It is also shown that these ideals lead naturally to monitors for data-structure operations

    Using certification trails to achieve software fault tolerance

    Get PDF
    A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper

    Experimental evaluation of certification trails using abstract data type validation

    Get PDF
    Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. Recent experimental work reveals many cases in which a certification-trail approach allows for significantly faster program execution time than a basic time-redundancy approach. Algorithms for answer-validation of abstract data types allow a certification trail approach to be used for a wide variety of problems. An attempt to assess the performance of algorithms utilizing certification trails on abstract data types is reported. Specifically, this method was applied to the following problems: heapsort, Hullman tree, shortest path, and skyline. Previous results used certification trails specific to a particular problem and implementation. The approach allows certification trails to be localized to 'data structure modules,' making the use of this technique transparent to the user of such modules

    Certification of computational results

    Get PDF
    A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance

    Certification trails and software design for testability

    Get PDF
    Design techniques which may be applied to make program testing easier were investigated. Methods for modifying a program to generate additional data which we refer to as a certification trail are presented. This additional data is designed to allow the program output to be checked more quickly and effectively. Certification trails were described primarily from a theoretical perspective. A comprehensive attempt to assess experimentally the performance and overall value of the certification trail method is reported. The method was applied to nine fundamental, well-known algorithms for the following problems: convex hull, sorting, huffman tree, shortest path, closest pair, line segment intersection, longest increasing subsequence, skyline, and voronoi diagram. Run-time performance data for each of these problems is given, and selected problems are described in more detail. Our results indicate that there are many cases in which certification trails allow for significantly faster overall program execution time than a 2-version programming approach, and also give further evidence of the breadth of applicability of this method

    Immature doublecortin-positive hippocampal neurons are important for learning but not for remembering

    Get PDF
    It is now widely accepted that hippocampal neurogenesis underpins critical cognitive functions, such as learning and memory. To assess the behavioral importance of adult-born neurons, we developed a novel knock-in mouse model that allowed us to specifically and reversibly ablate hippocampal neurons at an immature stage. In these mice, the diphtheria toxin receptor (DTR) is expressed under control of the doublecortin (DCX) promoter, which allows for specific ablation of immature DCX-expressing neurons after administration of diphtheria toxin while leaving the neural precursor pool intact. Using a spatially challenging behavioral test (a modified version of the active place avoidance test), we present direct evidence that immature DCX-expressing neurons are required for successful acquisition of spatial learning, as well as reversal learning, but are not necessary for the retrieval of stored long-term memories. Importantly, the observed learning deficits were rescued as newly generated immature neurons repopulated the granule cell layer upon termination of the toxin treatment. Repeat (or cyclic) depletion of immature neurons reinstated behavioral deficits if the mice were challenged with a novel task. Together, these findings highlight the potential of stimulating neurogenesis as a means to enhance learning

    Preliminary Design of the SAFE Platform

    Get PDF
    Safe is a clean-slate design for a secure host architecture. It integrates advances in programming languages, operating systems, and hardware and incorporates formal methods at every step. Though the project is still at an early stage, we have assembled a set of basic architectural choices that we believe will yield a high-assurance system. We sketch the current state of the design and discuss several of these choices
    corecore