6,107 research outputs found
Fault detection in asynchronous sequential circuits
As the asynchronous sequential circuit has become more and more important to digital systems in recent years high reliability and simple maintenance of the circuit is stressed. This paper presents a fault-detection algorithm which will be applicable to most of the practical asynchronous sequential circuits. The asynchronous sequential circuit is treated from the combinatoric point of view. First the minimal set of states, both stable states and unstable states, sufficient to detect all possible faults of the circuit is found from the fault table. Then a test sequence is generated to go through these states. It is assumed that testing outputs can be added. Simple and systematic techniques are also presented for the construction of fault table and the generation of test sequence. The usefulness of this algorithm increases as the density of the stable states associated with the circuit increases --Abstract, page ii
Secondary techniques for increasing fault coverage of fault detection test sequences for asynchronous sequential networks
The generation of fault detection sequences for asynchronous sequential networks is considered here. Several techniques exist for the generation of fault detection sequences on combinational and clocked sequential networks. Although these techniques provide closed solutions for combinational and clocked networks, they meet with much less success when used as strategies on asynchronous networks. It is presently assumed that the general asynchronous problem defies closed solution. For this reason, a secondary procedure is presented here to facilitate increased fault coverage by a given fault detection test sequence. This procedure is successful on all types of logic networks but is, perhaps, most useful in the asynchronous case since this is the problem on which other techniques fail. The secondary procedure has been designed to improve the fault coverage accomplished by any fault detection sequence regardless of the origin of the sequence. The increased coverage is accomplished by a minimum amount of additional internal hardware and/or a minimum of additional package outputs. The procedure presented here will function as part of an overall digital fault detection system, which will be composed of: 1) a compatible digital logic simulator, 2) a set of fault detection sequence generators, 3) secondary procedures for increasing fault coverage, 4) procedures to allow for diagnosis to a variable level. This research is directed at presenting a complete solution to the problems involved with developing secondary procedures for increasing the fault coverage of fault detection sequences --Abstract, pages ii-iii
A survey of an introduction to fault diagnosis algorithms
This report surveys the field of diagnosis and introduces some of the key algorithms and heuristics currently in use. Fault diagnosis is an important and a rapidly growing discipline. This is important in the design of self-repairable computers because the present diagnosis resolution of its fault-tolerant computer is limited to a functional unit or processor. Better resolution is necessary before failed units can become partially reuseable. The approach that holds the greatest promise is that of resident microdiagnostics; however, that presupposes a microprogrammable architecture for the computer being self-diagnosed. The presentation is tutorial and contains examples. An extensive bibliography of some 220 entries is included
Modeling Scalability of Distributed Machine Learning
Present day machine learning is computationally intensive and processes large
amounts of data. It is implemented in a distributed fashion in order to address
these scalability issues. The work is parallelized across a number of computing
nodes. It is usually hard to estimate in advance how many nodes to use for a
particular workload. We propose a simple framework for estimating the
scalability of distributed machine learning algorithms. We measure the
scalability by means of the speedup an algorithm achieves with more nodes. We
propose time complexity models for gradient descent and graphical model
inference. We validate our models with experiments on deep learning training
and belief propagation. This framework was used to study the scalability of
machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201
Adaptation and Evaluation of the Multisplitting-Newton and Waveform Relaxation Methods Over Distributed Volatile Environments
International audienceThis paper presents new adaptations of two methods that solve large differential equations systems, to the grid context. The first method isbased on the Multisplitting concept and the second on the Waveform Relaxation concept. Their adaptations are implemented according to the asynchronous iteration model which is well suited to volatile architectures that suffer from high latency networks. Many experiments were conducted to evaluate and compare the accuracy and performance of both methods while solving the advection-diffusion problem over heterogeneous, distributed and volatile architectures. The JACEP2P-V2 middleware provided the fault tolerant asynchronous environment, required for these experiments
Recommended from our members
Implementation relations for testing through asynchronous channels
This paper concerns testing from an input output transition system (IOTS) model of a system under test that interacts with its environment through asynchronous first in first out (FIFO) channels. It explores methods for analysing an IOTS without modelling the channels. If IOTS M produces sequence then, since communications are asynchronous, output can be delayed and so a different sequence might be observed. Thus M defines a language Tr(M) of sequences that can be observed when interacting with M through FIFO channels. We define implementation relations and equivalences in terms of Tr(M): an implementation relation says how IOTS N must relate to IOTS M in order for N to be a correct implementation of M. It is important to use an appropriate implementation relation since otherwise the verdict from a test run might be incorrect and because it influences test generation. It is undecidable whether IOTS N conforms to IOTS M and so also whether there is a test case that can distinguish between two IOTSs. We also investigate the situation in which we have a finite automaton P and either wish to know whether is empty or whether Tr(M) \cap \tr(P) is empty and prove that these are undecidable. In addition, we give conditions under which conformance and intersection are decidable.This work was partially supported by EPSRC grant EP/G04354X/1:The Birth, Life and Death of Semantic Mutants
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
Maintaining consistency in distributed systems
In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability
- …