Search CORE

16 research outputs found

Impact of Message Losses on Push-Sum Protocol in Chosen Topologies

Author: Kenyeres Jozef
Kenyeres Martin
Novotny Bohumil
Publication venue: 'European Scientific Institute, ESI'
Publication date: 31/03/2017
Field of study

In this paper, we examine the natural robustness of the push-sum protocol to message losses in a tree, a star, a mesh, a ring and a link topology. We experimentally verify the impact of this failure on the character of the estimations, the deviation of the final estimation from the real value and the impact on the change of the convergence rate

Crossref

European Scientific Journal, ESJ

European Scientific Journal (European Scientific Institute)

APUS: Fast and Scalable PAXOS on RDMA

Author: Chen X
Cui H
Jiang J
Wang C
YI N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on a number of hosts, tolerating various types of failures. Unfortunately, traditional Paxos protocols incur prohibitive performance overhead on server programs due to their high consensus latency on TCP/IP. Worse, the consensus latency of extant Paxos protocols increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, the first RDMA-based Paxos protocol that aims to be fast and scalable to client connections and hosts. APUS intercepts inbound socket calls of an unmodified server program, assigns a total order for all input requests, and uses fast RDMA primitives to replicate these requests concurrently. We evaluated APUS on nine widely-used server programs (e.g., Redis and MySQL). APUS incurred a mean overhead of 4.3% in response time and 4.2% in throughput. We integrated APUS with an SMR system Calvin. Our Calvin-APUS integration was 8.2X faster than the extant Calvin-ZooKeeper integration. The consensus latency of APUS outperformed an RDMA-based consensus protocol by 4.9X. APUS source code and raw results are released on github. com/hku-systems/apus.published_or_final_versio

HKU Scholars Hub

An Analysis of Partial Network Partitioning Failures in Modern Distributed Systems

Author: Alfatafta Mohammed
Publication venue: 'University of Waterloo'
Publication date: 19/12/2019
Field of study

We present a comprehensive study of system failures from 12 popular systems caused by a peculiar type of network partitioning faults: partial partitions. Partial partitions isolate a set of nodes from some, but not all, nodes in the cluster. Our study reveals the studied failures are catastrophic; they lead to data loss, complete system unavailability, or stale and dirty reads. Furthermore, our study reveals that these failures are easy to manifest, they are deterministic, they can be triggered by isolating a single node, and without any interaction with the system’s clients. We dissected the implemented fault tolerance techniques in eight popular systems. We identified four principled approaches for building a fault tolerance mechanism for partial partitions and identified the shortcomings of the current approaches. The currently implemented fault tolerance techniques are either specific to a particular protocol or implementation or may lead to a complete cluster shut down despite the availability of alternative network paths between the nodes. Finally, we present NIFTY, a generic communication layer that leverages the capabilities of modern software-defined networking to monitor and recover the connectivity of the cluster in case of partial network partitions. NIFTY is transparent to the application running on top of it. We built NiftyDB, a database system atop NIFTY. NiftyDB implements a set of optimizations that reduce the network overhead and operation latency in case of partial network partitioning. Our analysis and evaluation show that the proposed approach can effectively mask partial network partitioning faults without incurring additional overheads

University of Waterloo's Institutional Repository