6,107 research outputs found

    Fault detection in asynchronous sequential circuits

    Get PDF
    As the asynchronous sequential circuit has become more and more important to digital systems in recent years high reliability and simple maintenance of the circuit is stressed. This paper presents a fault-detection algorithm which will be applicable to most of the practical asynchronous sequential circuits. The asynchronous sequential circuit is treated from the combinatoric point of view. First the minimal set of states, both stable states and unstable states, sufficient to detect all possible faults of the circuit is found from the fault table. Then a test sequence is generated to go through these states. It is assumed that testing outputs can be added. Simple and systematic techniques are also presented for the construction of fault table and the generation of test sequence. The usefulness of this algorithm increases as the density of the stable states associated with the circuit increases --Abstract, page ii

    Secondary techniques for increasing fault coverage of fault detection test sequences for asynchronous sequential networks

    Get PDF
    The generation of fault detection sequences for asynchronous sequential networks is considered here. Several techniques exist for the generation of fault detection sequences on combinational and clocked sequential networks. Although these techniques provide closed solutions for combinational and clocked networks, they meet with much less success when used as strategies on asynchronous networks. It is presently assumed that the general asynchronous problem defies closed solution. For this reason, a secondary procedure is presented here to facilitate increased fault coverage by a given fault detection test sequence. This procedure is successful on all types of logic networks but is, perhaps, most useful in the asynchronous case since this is the problem on which other techniques fail. The secondary procedure has been designed to improve the fault coverage accomplished by any fault detection sequence regardless of the origin of the sequence. The increased coverage is accomplished by a minimum amount of additional internal hardware and/or a minimum of additional package outputs. The procedure presented here will function as part of an overall digital fault detection system, which will be composed of: 1) a compatible digital logic simulator, 2) a set of fault detection sequence generators, 3) secondary procedures for increasing fault coverage, 4) procedures to allow for diagnosis to a variable level. This research is directed at presenting a complete solution to the problems involved with developing secondary procedures for increasing the fault coverage of fault detection sequences --Abstract, pages ii-iii

    A survey of an introduction to fault diagnosis algorithms

    Get PDF
    This report surveys the field of diagnosis and introduces some of the key algorithms and heuristics currently in use. Fault diagnosis is an important and a rapidly growing discipline. This is important in the design of self-repairable computers because the present diagnosis resolution of its fault-tolerant computer is limited to a functional unit or processor. Better resolution is necessary before failed units can become partially reuseable. The approach that holds the greatest promise is that of resident microdiagnostics; however, that presupposes a microprogrammable architecture for the computer being self-diagnosed. The presentation is tutorial and contains examples. An extensive bibliography of some 220 entries is included

    Modeling Scalability of Distributed Machine Learning

    Full text link
    Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing nodes. It is usually hard to estimate in advance how many nodes to use for a particular workload. We propose a simple framework for estimating the scalability of distributed machine learning algorithms. We measure the scalability by means of the speedup an algorithm achieves with more nodes. We propose time complexity models for gradient descent and graphical model inference. We validate our models with experiments on deep learning training and belief propagation. This framework was used to study the scalability of machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201

    Adaptation and Evaluation of the Multisplitting-Newton and Waveform Relaxation Methods Over Distributed Volatile Environments

    No full text
    International audienceThis paper presents new adaptations of two methods that solve large differential equations systems, to the grid context. The first method isbased on the Multisplitting concept and the second on the Waveform Relaxation concept. Their adaptations are implemented according to the asynchronous iteration model which is well suited to volatile architectures that suffer from high latency networks. Many experiments were conducted to evaluate and compare the accuracy and performance of both methods while solving the advection-diffusion problem over heterogeneous, distributed and volatile architectures. The JACEP2P-V2 middleware provided the fault tolerant asynchronous environment, required for these experiments

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Maintaining consistency in distributed systems

    Get PDF
    In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability
    corecore