Search CORE

51,918 research outputs found

New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

Author: Shaheen Masoud Esmail Masoud
Publication venue: The Aquila Digital Community
Publication date: 01/12/2013
Field of study

Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

Aquila Digital Community

Distributed protocols as behaviours in Erlang

Author: Demicoli Darren
Francalanza Adrian
Publication venue: University of Malta. Faculty of Information and Communication Technology
Publication date: 01/01/2010
Field of study

We investigate the implementation of standard algorithms for three classes of Distributed Agreement problems in Erlang, an industry-strength language for programming fault tolerant distributed systems. We develop a framework to bridge the gap between the assumptions of these standard algorithm and the network abstraction provided by Erlang, and structure our implementations as reusable behaviours within this framework.peer-reviewe

OAR@UM

Dynamic fault tolerant grid workflow in the water threat management project

Author: Moon Young Suk
Publication venue: RIT Scholar Works
Publication date: 01/01/2010
Field of study

Achieving fault tolerance is an inevitable problem in distributed systems, with it becoming more challenging in decentralized, heterogeneous, and dynamic-environment systems such as a Grid. When deploying applications requires time-criticality, how to allocate resources for jobs in a fault-tolerant manner is an important issue for the delivery of the services. The Water Threat Management project is a research to find solutions for the contamination incidents problems in urban water distribution systems, and it involves the development of the cyberinfrastructure in a Grid environment. To handle such urgent events properly, the deployment of the system demands real-time processing without the failure. Our approach of integrating a fault-tolerant framework into a Water Threat Management system provides fault tolerance at the queuing stage rather than the job-execution stage by scheduling jobs in fault-tolerant ways. This includes the development of the batch queuing system in the Cyberaide Shell project. In addition, we present a dynamic workflow in the Water Threat Management system that can reduce the queue wait time in the changing environment

RIT Scholar Works

Compact routing in fault-tolerant distributed systems

Author: Lawrence James Edward
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1997
Field of study

A compact routing algorithm is a routing algorithm which reduces the space complexity of all-pairs shortest path routing. Compact routing protocols in distributed systems have been studied extensively as an attractive alternative to the traditional method of all-pairs shortest path routing. The use of compact routing protocols have several advantages. Compact routing schemes are not only more memory-efficient, but provide faster routing table lookup, more efficient broadcast scheme, and allow for a more scalable network. These routing schemes still maintain optimal or near-optimal routing paths. However, most of the compact routing protocols are not fault-tolerant. This thesis will first report the recent developments in the compact routing research. Several new methods for compact routing in fault-tolerant distributed systems will be presented and analyzed. The most important feature of the algorithms presented in this thesis is that they are self-stabilizing. The self-stabilization paradigm has been shown to be the most unified and all-inclusive approach to the design of fault-tolerant system. Additionally, these algorithms will address and solve several problems left unsolved by previous works. Relabelable and non-relabelable networks will be considered for both specific and arbitrary topologies

University of Nevada, Las Vegas Repository

Asymmetric Distributed Trust

Author: Cachin Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Principles of Distributed Systems (OPODIS 2019)
Publication date: 01/01/2020
Field of study

Quorum systems are a key abstraction in distributed fault-tolerant computing for capturing trust assumptions. They can be found at the core of many algorithms for implementing reliable broadcasts, shared memory, consensus and other problems. This paper introduces asymmetric Byzantine quorum systems that model subjective trust. Every process is free to choose which combinations of other processes it trusts and which ones it considers faulty. Asymmetric quorum systems strictly generalize standard Byzantine quorum systems, which have only one global trust assumption for all processes. This work also presents protocols that implement abstractions of shared memory and broadcast primitives with processes prone to Byzantine faults and asymmetric trust. The model and protocols pave the way for realizing more elaborate algorithms with asymmetric trust

Crossref

Dagstuhl Research Online Publication Server

Brief Announcement: Asymmetric Distributed Trust

Author: Cachin Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

Dagstuhl Research Online Publication Server