51,918 research outputs found

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

    Distributed protocols as behaviours in Erlang

    Get PDF
    We investigate the implementation of standard algorithms for three classes of Distributed Agreement problems in Erlang, an industry-strength language for programming fault tolerant distributed systems. We develop a framework to bridge the gap between the assumptions of these standard algorithm and the network abstraction provided by Erlang, and structure our implementations as reusable behaviours within this framework.peer-reviewe

    Dynamic fault tolerant grid workflow in the water threat management project

    Get PDF
    Achieving fault tolerance is an inevitable problem in distributed systems, with it becoming more challenging in decentralized, heterogeneous, and dynamic-environment systems such as a Grid. When deploying applications requires time-criticality, how to allocate resources for jobs in a fault-tolerant manner is an important issue for the delivery of the services. The Water Threat Management project is a research to find solutions for the contamination incidents problems in urban water distribution systems, and it involves the development of the cyberinfrastructure in a Grid environment. To handle such urgent events properly, the deployment of the system demands real-time processing without the failure. Our approach of integrating a fault-tolerant framework into a Water Threat Management system provides fault tolerance at the queuing stage rather than the job-execution stage by scheduling jobs in fault-tolerant ways. This includes the development of the batch queuing system in the Cyberaide Shell project. In addition, we present a dynamic workflow in the Water Threat Management system that can reduce the queue wait time in the changing environment

    Compact routing in fault-tolerant distributed systems

    Full text link
    A compact routing algorithm is a routing algorithm which reduces the space complexity of all-pairs shortest path routing. Compact routing protocols in distributed systems have been studied extensively as an attractive alternative to the traditional method of all-pairs shortest path routing. The use of compact routing protocols have several advantages. Compact routing schemes are not only more memory-efficient, but provide faster routing table lookup, more efficient broadcast scheme, and allow for a more scalable network. These routing schemes still maintain optimal or near-optimal routing paths. However, most of the compact routing protocols are not fault-tolerant. This thesis will first report the recent developments in the compact routing research. Several new methods for compact routing in fault-tolerant distributed systems will be presented and analyzed. The most important feature of the algorithms presented in this thesis is that they are self-stabilizing. The self-stabilization paradigm has been shown to be the most unified and all-inclusive approach to the design of fault-tolerant system. Additionally, these algorithms will address and solve several problems left unsolved by previous works. Relabelable and non-relabelable networks will be considered for both specific and arbitrary topologies

    Asymmetric Distributed Trust

    Get PDF
    Quorum systems are a key abstraction in distributed fault-tolerant computing for capturing trust assumptions. They can be found at the core of many algorithms for implementing reliable broadcasts, shared memory, consensus and other problems. This paper introduces asymmetric Byzantine quorum systems that model subjective trust. Every process is free to choose which combinations of other processes it trusts and which ones it considers faulty. Asymmetric quorum systems strictly generalize standard Byzantine quorum systems, which have only one global trust assumption for all processes. This work also presents protocols that implement abstractions of shared memory and broadcast primitives with processes prone to Byzantine faults and asymmetric trust. The model and protocols pave the way for realizing more elaborate algorithms with asymmetric trust

    Brief Announcement: Asymmetric Distributed Trust

    Get PDF
    Quorum systems are a key abstraction in distributed fault-tolerant computing for capturing trust assumptions. They can be found at the core of many algorithms for implementing reliable broadcasts, shared memory, consensus and other problems. This paper introduces asymmetric Byzantine quorum systems that model subjective trust. Every process is free to choose which combinations of other processes it trusts and which ones it considers faulty. Asymmetric quorum systems strictly generalize standard Byzantine quorum systems, which have only one global trust assumption for all processes. This work also presents protocols that implement abstractions of shared memory and broadcast primitives with processes prone to Byzantine faults and asymmetric trust. The model and protocols pave the way for realizing more elaborate algorithms with asymmetric trust
    • …
    corecore