Search CORE

1,487 research outputs found

Efficient diagnosis of multiprocessor systems under probabilistic models

Author: Blough Douglas M.
Masson Gerald M.
Sullivan Gregory F.
Publication venue
Publication date
Field of study

The problem of fault diagnosis in multiprocessor systems is considered under a probabilistic fault model. The focus is on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. Lower and upper bounds on the number of tests required for regular systems are also presented. A class of regular systems which includes hypercubes is shown to be correctly diagnosable with high probability. In all cases, the number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis techniques

NASA Technical Reports Server

Designing and Valuating System on Dependability Analysis of Cluster-Based Multiprocessor System

Author: Dr. Sudarson Jena
Pulicherla Radhika
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2020
Field of study

Analysis of dependability is a significant stage in structuring and examining the safety of protection systems and computer systems. The introduction of virtual machines and multiprocessors leads to increasing the faults of the system, particularly for the failures that are software- induced, affecting the overall dependability. Also, it is different for the successful operation of the safety system at any dynamic stage, since there is a tremendous distinction in the rate of failure among the failures that are induced by the software and the hardware. Thus this paper presents a review or different dependability analysis techniques employed in multiprocessor system

Global Journal of Computer Science and Technology (GJCST)

Recommended from our members

Reliability and fault tolerance modelling of multiprocessor systems

Author: Valdivia Roberto Abraham
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/1989
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Reliability evaluation by analytic modelling constitute an important issue of designing a reliable multiprocessor system. In this thesis, a model for reliability and fault tolerance analysis of the interconnection network is presented, based on graph theory. Reliability and fault tolerance are considered as deterministic and probabilistic measures of connectivity. Exact techniques for reliability evaluation fail for large multiprocessor systems because of the enormous computational resources required. Therefore, approximation techniques have to be used. Three approaches are proposed, the first by simplifying the symbolic expression of reliability; the other two by applying a hierarchical decomposition to the system. All these methods give results close to those obtained by exact techniques.Consejo Nacional de Ciencia y Tecnologia" (National Council for Science and Technology of Mexico) and "Instituto de Investigaciones Electricas" (Institute for Electrical Research

Brunel University Research Archive

Survivable algorithms and redundancy management in NASA's distributed computing systems

Author: Malek Miroslaw
Publication venue
Publication date
Field of study

The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given

NASA Technical Reports Server

Probabilistic diagnostics with P-graphs

Author: Polgár Balázs
Selényi Endre
Publication venue
Publication date: 01/01/2002
Field of study

This paper presents a novel approach for solving the probabilistic diagnosis problem in multiprocessor systems. The main idea of the algorithm is based on the reformulation of the diagnostic procedure as a P-graph model. The same, well-elaborated mathematical paradigm - originally used to model material flow - can be applied in our approach to model information flow. This idea is illustrated by deriving a maximum likelihood diagnostic decision procedure. The diagnostic accuracy of the solution is considered on the basis of simulation measurements, and a method of constructing a general framework for different aspects of a complex problem is demonstrated with the use of P-graph models

University of Szeged

Redundancy management for efficient fault recovery in NASA's distributed computing system

Author: Malek Miroslaw
Pandya Mihir
Yau Kitty
Publication venue
Publication date
Field of study

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

NASA Technical Reports Server

Gradient based system-level diagnosis

Author: Polgár Balázs
Selényi Endre
Publication venue: Periodica Polytechnica Electrical Engineering (Archives)
Publication date: 30/06/2007
Field of study

Traditional approaches in system-level diagnosis in multiprocessor systems are usually based on the oversimplified PMC test invalidation model, however Blount introduced a more general model containing conditional probabilities as parameters for different test invalidation situations. He suggested a lookup table based approach, but no algorithmic solution has been elaborated until our P-graph based solution introduced in previous publications. In this approach the diagnostic process is formulated as an optimization problem and the optimal solution is determined. Although the average behavior of the algorithm is quite good, the worst case complexity is exponential. In this paper we introduce a novel group of fast diagnostic algorithms that we named gradient based algorithms. This approach only approximates the optimal maximum likelihood or maximum a posteriori solution, but it has a polynomial complexity of the magnitude of O\left (N \cdot NbCount + N^2\right ), where N is the size of the system and NbCount is number of neighbors of a single unit. The idea of the base algorithm is that it takes an initial fault pattern and iterates till the likelihood of the actual fault pattern can be increased with a single state-change in the pattern. Improvements of this base algorithm, complexity analysis and simulation results are also presented. The main, although not exclusive application field of the algorithms is wafer-scale diagnosis, since the accuracy and the performance is still good even if relative large number of faults are present

Periodica Polytechnica (Budapest University of Technology and Economics)