Search CORE

2,095 research outputs found

General-demand disjoint path covers in a graph with faulty elements

Author: Jae-Ha Lee
Jung-Heum Park
Park J.-H.
Park J.-H.
Park J.-H.
Park J.-H.
Roberts F. S.
Vaidya A. S.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Quantifying fault recovery in multiprocessor systems

Author: Harary Frank
Malek Miroslaw
Publication venue
Publication date
Field of study

Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault recovery. The mathematical model which proves to be most appropriate is provided by the theory of graphs. New measures for fault recovery are developed and the value of elements of the fault recovery vector are observed to depend not only on the computation graph H and the architecture graph G, but also on the specific location of a fault. In the examples, a hypercube is chosen as a representative of parallel computer architecture, and a pipeline as a typical configuration for program execution. Dependability qualities of such a system is defined with or without a fault. These qualities are determined by the resiliency triple defined by three parameters: multiplicity, robustness, and configurability. Parameters for measuring the recovery effectiveness are also introduced in terms of distance, time, and the number of new, used, and moved nodes and edges

NASA Technical Reports Server

Single-source three-disjoint path covers in cubes of connected graphs

Author: Abderrezzak
Asdre
Bondy
Chartrand
Chartrand
Chen
Chia
Dvořák
Ekstein
Fleischner
Fleischner
Georgakopoulos
Gregor
Harary
Insung Ihm
Jung-Heum Park
Karaganis
Kim
Koh
Lesniak
Ntafos
Paoli
Park
Park
Radoszewski
Schaar
Sekanina
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Redundancy management for efficient fault recovery in NASA's distributed computing system

Author: Malek Miroslaw
Pandya Mihir
Yau Kitty
Publication venue
Publication date
Field of study

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

NASA Technical Reports Server

Doing-it-All with Bounded Work and Communication

Author: Alistarh
Alistarh
Alon
Birman
Birman
Bridgland
Cachin
Censor-Hillel
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chung
Clementi
Davidoff
Davtyan
Davtyan
Davtyan
De Prisco
Diks
Drucker
Dwork
Dwork
Fernández
Galil
Georgiou
Georgiou
Georgiou
Georgiou
Georgiou
Georgiou
Goldberg
Kanellakis
Kentros
Kentros
Kentros
Kontogiannis
Kowalski
Kowalski
Lamport
Lubotzky
Margulis
Mitzenmacher
Pippenger
Saks
Tanner
Upfal
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

We consider the Do-All problem, where

p

cooperating processors need to complete

t

similar and independent tasks in an adversarial setting. Here we deal with a synchronous message passing system with processors that are subject to crash failures. Efficiency of algorithms in this setting is measured in terms of work complexity (also known as total available processor steps) and communication complexity (total number of point-to-point messages). When work and communication are considered to be comparable resources, then the overall efficiency is meaningfully expressed in terms of effort defined as work + communication. We develop and analyze a constructive algorithm that has work

O( t + p \log p\, (\sqrt{p\log p}+\sqrt{t\log t}\, ) )

and a nonconstructive algorithm that has work

O(t +p \log^2 p)

. The latter result is close to the lower bound

\Omega(t + p \log p/ \log \log p)

on work. The effort of each of these algorithms is proportional to its work when the number of crashes is bounded above by

c\,p

, for some positive constant

c < 1

. We also present a nonconstructive algorithm that has effort

O(t + p ^{1.77})

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Counter Attack on Byzantine Generals: Parameterized Model Checking of Fault-tolerant Distributed Algorithms

Author: John Annu
Konnov Igor
Schmid Ulrich
Veith Helmut
Widder Josef
Publication venue
Publication date: 03/02/2013
Field of study

We introduce an automated parameterized verification method for fault-tolerant distributed algorithms (FTDA). FTDAs are parameterized by both the number of processes and the assumed maximum number of Byzantine faulty processes. At the center of our technique is a parametric interval abstraction (PIA) where the interval boundaries are arithmetic expressions over parameters. Using PIA for both data abstraction and a new form of counter abstraction, we reduce the parameterized problem to finite-state model checking. We demonstrate the practical feasibility of our method by verifying several variants of the well-known distributed algorithm by Srikanth and Toueg. Our semi-decision procedures are complemented and motivated by an undecidability proof for FTDA verification which holds even in the absence of interprocess communication. To the best of our knowledge, this is the first paper to achieve parameterized automated verification of Byzantine FTDA

arXiv.org e-Print Archive

CiteSeerX

Secondary techniques for increasing fault coverage of fault detection test sequences for asynchronous sequential networks

Author: Hoover Lewis Ronald
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1972
Field of study

The generation of fault detection sequences for asynchronous sequential networks is considered here. Several techniques exist for the generation of fault detection sequences on combinational and clocked sequential networks. Although these techniques provide closed solutions for combinational and clocked networks, they meet with much less success when used as strategies on asynchronous networks. It is presently assumed that the general asynchronous problem defies closed solution. For this reason, a secondary procedure is presented here to facilitate increased fault coverage by a given fault detection test sequence. This procedure is successful on all types of logic networks but is, perhaps, most useful in the asynchronous case since this is the problem on which other techniques fail. The secondary procedure has been designed to improve the fault coverage accomplished by any fault detection sequence regardless of the origin of the sequence. The increased coverage is accomplished by a minimum amount of additional internal hardware and/or a minimum of additional package outputs. The procedure presented here will function as part of an overall digital fault detection system, which will be composed of: 1) a compatible digital logic simulator, 2) a set of fault detection sequence generators, 3) secondary procedures for increasing fault coverage, 4) procedures to allow for diagnosis to a variable level. This research is directed at presenting a complete solution to the problems involved with developing secondary procedures for increasing the fault coverage of fault detection sequences --Abstract, pages ii-iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine