1,071 research outputs found

    Efficient diagnosis of multiprocessor systems under probabilistic models

    Get PDF
    The problem of fault diagnosis in multiprocessor systems is considered under a probabilistic fault model. The focus is on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. Lower and upper bounds on the number of tests required for regular systems are also presented. A class of regular systems which includes hypercubes is shown to be correctly diagnosable with high probability. In all cases, the number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis techniques

    SIMULATION-BASED PERFORMABILITY ANALYSIS OF MULTIPROCESSOR SYSTEMS

    Get PDF
    The primary focus in the analysis of multiprocessor systems has traditionally been on their performance. However, their large number of components, their complex network topologies, and sophisticated system software can make them very unreliable. The dependability of a computing system ought to be considered in an early stage of its development in order to take influence on the system architecture and to achieve best performance with high dependability. In this paper a simulation-based method for the combined performance and dependability analysis of fault tolerant multiprocessor systems are presented which provide meaningful results already during the design phase

    On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management

    Get PDF
    This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation

    Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    Get PDF
    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive

    Gradient based system-level diagnosis

    Get PDF
    Traditional approaches in system-level diagnosis in multiprocessor systems are usually based on the oversimplified PMC test invalidation model, however Blount introduced a more general model containing conditional probabilities as parameters for different test invalidation situations. He suggested a lookup table based approach, but no algorithmic solution has been elaborated until our P-graph based solution introduced in previous publications. In this approach the diagnostic process is formulated as an optimization problem and the optimal solution is determined. Although the average behavior of the algorithm is quite good, the worst case complexity is exponential. In this paper we introduce a novel group of fast diagnostic algorithms that we named gradient based algorithms. This approach only approximates the optimal maximum likelihood or maximum a posteriori solution, but it has a polynomial complexity of the magnitude of O\left (N \cdot NbCount + N^2\right ), where N is the size of the system and NbCount is number of neighbors of a single unit. The idea of the base algorithm is that it takes an initial fault pattern and iterates till the likelihood of the actual fault pattern can be increased with a single state-change in the pattern. Improvements of this base algorithm, complexity analysis and simulation results are also presented. The main, although not exclusive application field of the algorithms is wafer-scale diagnosis, since the accuracy and the performance is still good even if relative large number of faults are present

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    A fault-tolerant multiprocessor architecture for aircraft, volume 1

    Get PDF
    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed

    Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

    Get PDF
    SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness

    Constraint Based System-Level Diagnosis of Multiprocessors

    Get PDF
    Massively parallel multiprocessors induce new requirements for system-level fault diagnosis, like handling a huge number of processing elements in an inhomogeneous system. Traditional diagnostic models (like PMC, BGM, etc.) are insufficient to fulfill all of these requirements. This paper presents a novel modelling technique, based on a special area of artificial intelligence (AI) methods: constraint satisfaction (CS). The constraint based approach is able to handle functional faults in a similar way to the Russel-Kime model. Moreover, it can use multiple-valued logic to deal with system components having multiple fault modes. The resolution of the produced models can be adjusted to fit the actual diagnostic goal. Consequently, constrint based methods are applicable to a much wider range of multiprocessor architectures than earlier models. The basic problem of system-level diagnosis, syndrome decoding, can be easily transformed into a constraint satisfaction problem (CSP). Thus, the diagnosis algorithm can be derived from the related constraint solving algorithm. Different abstraction leveles can be used for the various diagnosis resolutions, employing the same methodology. As examples, two algorithms are described in the paper; both of them is intended for the Parsytec GCel massively parallel system. The centralized method uses a more elaborate system model, and provides detailed diagnostic information, suitable for off-line evaluation. The distributed method makes fast decisions for reconfiguration control, using a simplified model. Keywords system-level self-diagnosis, massively parallel computing systems, constraint satisfaction, diagnostic models, centralized and distributed diagnostic algorithms

    Advances and Technologies in High Voltage Power Systems Operation, Control, Protection and Security

    Get PDF
    The electrical demands in several countries around the world are increasing due to the huge energy requirements of prosperous economies and the human activities of modern life. In order to economically transfer electrical powers from the generation side to the demand side, these powers need to be transferred at high-voltage levels through suitable transmission systems and power substations. To this end, high-voltage transmission systems and power substations are in demand. Actually, they are at the heart of interconnected power systems, in which any faults might lead to unsuitable consequences, abnormal operation situations, security issues, and even power cuts and blackouts. In order to cope with the ever-increasing operation and control complexity and security in interconnected high-voltage power systems, new architectures, concepts, algorithms, and procedures are essential. This book aims to encourage researchers to address the technical issues and research gaps in high-voltage transmission systems and power substations in modern energy systems
    • …
    corecore