58,505 research outputs found

    Building global and scalable systems with atomic multicast

    Get PDF
    The rise of worldwide Internet-scale services demands large distributed systems. Indeed, when handling several millions of users, it is common to operate thousands of servers spread across the globe. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems. Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements. While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based ``always-on'' applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes. Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient high-level communication abstractions. Since the implementation of a production-ready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B-trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant and scalable data structure would help developers to integrate their applications in a distribution transparent manner. This work describes how to build reliable and scalable distributed systems based on atomic multicast and demonstrates their capabilities by an implementation of a distributed ordered map that supports dynamic re-partitioning and fast recovery. To substantiate our claim, we ported an existing SQL database atop of our distributed lock-free data structure. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems. Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements. While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based ``always-on'' applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes. Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient high-level communication abstractions. Since the implementation of a production-ready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B- trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant and scalable data structure would help developers to integrate their applications in a distribution transparent manner. This work describes how to build reliable and scalable distributed systems based on atomic multicast and demonstrates their capabilities by an implementation of a distributed ordered map that supports dynamic re-partitioning and fast recovery. To substantiate our claim, we ported an existing SQL database atop of our distributed lock-free data structure

    The Raincore Distributed Session Service for Networking Elements

    Get PDF
    Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer protocols for networking elements. These protocols are designed to enable a network cluster to share the state information necessary for balancing network traffic and computation load among a group of networking elements. In addition, in the presence of failures, they allow network traffic to fail-over from failed networking elements to healthy ones. To maximize the overall network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals

    Programming with process groups: Group and multicast semantics

    Get PDF
    Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

    Atomic-SDN: Is Synchronous Flooding the Solution to Software-Defined Networking in IoT?

    Get PDF
    The adoption of Software Defined Networking (SDN) within traditional networks has provided operators the ability to manage diverse resources and easily reconfigure networks as requirements change. Recent research has extended this concept to IEEE 802.15.4 low-power wireless networks, which form a key component of the Internet of Things (IoT). However, the multiple traffic patterns necessary for SDN control makes it difficult to apply this approach to these highly challenging environments. This paper presents Atomic-SDN, a highly reliable and low-latency solution for SDN in low-power wireless. Atomic-SDN introduces a novel Synchronous Flooding (SF) architecture capable of dynamically configuring SF protocols to satisfy complex SDN control requirements, and draws from the authors' previous experiences in the IEEE EWSN Dependability Competition: where SF solutions have consistently outperformed other entries. Using this approach, Atomic-SDN presents considerable performance gains over other SDN implementations for low-power IoT networks. We evaluate Atomic-SDN through simulation and experimentation, and show how utilizing SF techniques provides latency and reliability guarantees to SDN control operations as the local mesh scales. We compare Atomic-SDN against other SDN implementations based on the IEEE 802.15.4 network stack, and establish that Atomic-SDN improves SDN control by orders-of-magnitude across latency, reliability, and energy-efficiency metrics

    Atomic commitment in transactional DHTs

    Get PDF
    We investigate the problem of atomic commit in transactional database systems built on top of Distributed Hash Tables. DHTs provide a decentralized way to store and look up data. To solve the atomic commit problem we propose to use an adaption of Paxos commit as a non-blocking algorithm. We exploit the symmetric replication technique existing in the DKS DHT to determine which nodes are necessary to execute the commit algorithm. By doing so we achieve a lower number of communication rounds and a reduction of meta-data in contrast to traditional Three-Phase-Commit protocols. We also show how the proposed solution can cope with dynamism due to churn in DHTs. Our solution works correctly relying only on an inaccurate failure detection of node failure which is necessary for systems running over the Internet

    A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)

    Full text link
    Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables. Modifications from previous version: extended the evaluation of AllConcur+ with a simulation of a multiple datacenters deploymen
    • …
    corecore