49,583 research outputs found

    Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Get PDF
    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described

    Fault-tolerant computer study

    Get PDF
    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

    Redundancy management for efficient fault recovery in NASA's distributed computing system

    Get PDF
    The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

    Lightweigth Adaptive fault-tolerant data storage system (AFTSYS)

    Get PDF
    Research group ARCOS of Universidad Carlos III de Madrid (Spain) have been working on flexible and adaptive data storage systems for several years. The storage systems developed are featured by software governance, making them portable across different hardware storage resources, and their dynamic adaptativy to the different circumstances of computer systems following the autonomic system paradigm. They also allow getting high performance storage by using data distribution or striping across multiple devices. One of the group’s technologies y the AFTSYS system. A fault-tolerant storage system for persistent distributed objects, user configurable and adaptive to system behaviour

    Fault Tolerant Ancillary Function of Power Converters in Distributed Generation Power System within a Microgrid Structure

    Get PDF
    Distributed generation (DG) is deeply changing the existing distribution networks which become very sophisticated and complex incorporating both active and passive equipment. The simplification of their management can be obtained assuming a structure with small networks, namely, microgrids, reproducing, in a smaller scale, the structure of large networks including production, transmission, and distribution of the electrical energy. Power converters in distributed generation systems carry on some different ancillary functions as, for example, grid synchronization, islanding detection, fault ride through, and so on. In view of an optimal utilization of the generated electrical power, fault tolerant operation is to be considered as a suitable ancillary function for the next future. This paper presents a complete modeling of fault tolerant inverters able to simulate the main fault type occurrence and a control algorithm for fault tolerant converters suitable for microgrids. After the model description, formulated in terms of healthy device and leg binary variables, and the illustration of the fault tolerant control strategy, the paper shows how the control preserves power quality when the converter works in the linear range. The effectiveness of the proposed approach and control is shown through computer simulations and experimental results

    Fault-Tolerant Load Management for Real-Time Distributed Computer Systems

    Get PDF
    This paper presents a fault-tolerant scheme applicable to any decentralized load balancing algorithms used in soft real-time distributed systems. Using the theory of distance-transitive graphs for representing topologies of these systems, the proposed strategy partitions these systems into independent symmetric regions (spheres) centered at some control points. These central points, called fault-control points, provide a two-level task redundancy and efficiently re-distribute the load of failed nodes within their spheres. Using the algebraic characteristics of these topologies, it is shown that the identification of spheres and fault-control points is, in general, is an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard matrix. Assuming a realistic failure-repair system environment, the performance of the proposed strategy has been evaluated and compared with no fault environment, through an extensive and detailed simulation. For our fault-tolerant strategy, we propose two measures of goodness, namely, the percentage of re-scheduled tasks which meet their deadlines and the overhead incurred for fault management. It is shown that using the proposed strategy, up to 80% of the tasks can still meet their deadlines. The proposed strategy is general enough to be applicable to many networks, belonging to a number of families of distance transitive graphs. Through simulation, we have analyzed the sensitivity of this strategy to various system parameters and have shown that the performance degradation due to failures does not depend on these parameter. Also, the probability of a task being lost altogether due to multiple failures has been shown to be extremely low

    Scalable and Reliable Middlebox Deployment

    Get PDF
    Middleboxes are pervasive in modern computer networks providing functionalities beyond mere packet forwarding. Load balancers, intrusion detection systems, and network address translators are typical examples of middleboxes. Despite their benefits, middleboxes come with several challenges with respect to their scalability and reliability. The goal of this thesis is to devise middlebox deployment solutions that are cost effective, scalable, and fault tolerant. The thesis includes three main contributions: First, distributed service function chaining with multiple instances of a middlebox deployed on different physical servers to optimize resource usage; Second, Constellation, a geo-distributed middlebox framework enabling a middlebox application to operate with high performance across wide area networks; Third, a fault tolerant service function chaining system

    Analysis of Trade-offs in Fault-Tolerant Distributed Computing and Replicated Databases

    Get PDF
    This paper examines fundamental trade-offs in fault-tolerant distributed systems and replicated databases built over the Internet. We discuss interplays between consistency, availability, and latency which are in the very nature of globally distributed computer systems and also analyse their interconnection with durability and energy efficiency. In this paper we put forward an idea that consistency, availability, latency, durability and other properties need to be viewed as more continuous than binary in contrast to the well-known CAP/PACELC theorems. We compare different consistency models and highlight the role of the application timeout, replication factor and other settings that essentially determine the interplay between above properties. Our findings may be of interest to software engineers and system architects who develop Internet-scale distributed computer systems and cloud solutions
    • …
    corecore