Search CORE

901 research outputs found

Computing in the RAIN: a reliable array of independent nodes

Author: Bohossian Vasken
Bruck Jehoshua
Fan Chenggong C.
LeMahieu Paul S.
Riedel Marc D.
Xu Lihao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

CiteSeerX

Caltech Authors

Keeping Authorities "Honest or Bust" with Decentralized Witness Cosigning

Author: Ford Bryan
Gailly Nicolas
Gasser Linus
Jovanovic Philipp
Khoffi Ismail
Syta Ewa
Tamas Iulia
Visher Dylan
Wolinsky David Isaac
Publication venue
Publication date: 30/05/2016
Field of study

The secret keys of critical network authorities - such as time, name, certificate, and software update services - represent high-value targets for hackers, criminals, and spy agencies wishing to use these keys secretly to compromise other hosts. To protect authorities and their clients proactively from undetected exploits and misuse, we introduce CoSi, a scalable witness cosigning protocol ensuring that every authoritative statement is validated and publicly logged by a diverse group of witnesses before any client will accept it. A statement S collectively signed by W witnesses assures clients that S has been seen, and not immediately found erroneous, by those W observers. Even if S is compromised in a fashion not readily detectable by the witnesses, CoSi still guarantees S's exposure to public scrutiny, forcing secrecy-minded attackers to risk that the compromise will soon be detected by one of the W witnesses. Because clients can verify collective signatures efficiently without communication, CoSi protects clients' privacy, and offers the first transparency mechanism effective against persistent man-in-the-middle attackers who control a victim's Internet access, the authority's secret key, and several witnesses' secret keys. CoSi builds on existing cryptographic multisignature methods, scaling them to support thousands of witnesses via signature aggregation over efficient communication trees. A working prototype demonstrates CoSi in the context of timestamping and logging authorities, enabling groups of over 8,000 distributed witnesses to cosign authoritative statements in under two seconds.Comment: 20 pages, 7 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

UCL Discovery

Perspectives on the CAP Theorem

Author: Citable Link
Nancy A. Lynch
Seth Gilbert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2012
Field of study

Almost twelve years ago, in 2000, Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance. This trade-off, which has become known as the CAP Theorem, has been widely discussed ever since. In this paper, we review the CAP Theorem and situate it within the broader context of distributed computing theory. We then discuss the practical implications of the CAP Theorem, and explore some general techniques for coping with the inherent trade-offs that it implies

CiteSeerX

DSpace@MIT

Crossref

Genuinely Distributed Byzantine Machine Learning

Author: Abadi Martín
Abraham Ittai
Alistarh Dan
Biggio Battista
Castro Miguel
Chen Lingjiao
Chilimbi Trishul M
El Mhamdi El Mahdi
Hecht-Nielsen Robert
Hsieh Kevin
Li Mu
Rajput Shashank
Xie Cong
Publication venue
Publication date: 02/06/2020
Field of study

Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the ``general'' Byzantine-resilient distributed machine learning problem where no individual component is trusted. We show that this problem can be solved in an asynchronous system, despite the presence of

\frac{1}{3}

Byzantine parameter servers and

\frac{1}{3}

Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, Scatter/Gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, Distributed Median Contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring learning convergence. The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives (e.g., Krum). Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony.Comment: This is a merge of arXiv:1905.03853 and arXiv:1911.07537; arXiv:1911.07537 will be retracte

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Making Byzantine Consensus Live

Author: Bravo Manuel
Chockler Gregory
Gotsman Alexey
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

Partially synchronous Byzantine consensus protocols typically structure their execution into a sequence of views, each with a designated leader process. The key to guaranteeing liveness in these protocols is to ensure that all correct processes eventually overlap in a view with a correct leader for long enough to reach a decision. We propose a simple view synchronizer abstraction that encapsulates the corresponding functionality for Byzantine consensus protocols, thus simplifying their design. We present a formal specification of a view synchronizer and its implementation under partial synchrony, which runs in bounded space despite tolerating message loss during asynchronous periods. We show that our synchronizer specification is strong enough to guarantee liveness for single-shot versions of several well-known Byzantine consensus protocols, including HotStuff, Tendermint, PBFT and SBFT. We furthermore give precise latency bounds for these protocols when using our synchronizer. By factoring out the functionality of view synchronization we are able to specify and analyze the protocols in a uniform framework, which allows comparing them and highlights trade-offs

Dagstuhl Research Online Publication Server

Generating Fast Indulgent Algorithms

Author: Alistarh Dan
Gilbert Seth
Guerraoui Rachid
Travers Corentin
Publication venue
Publication date: 18/06/2018
Field of study

Synchronous distributed algorithms are easier to design and prove correct than algorithms that tolerate asynchrony. Yet, in the real world, networks experience asynchrony and other timing anomalies. In this paper, we address the question of how to efficiently transform an algorithm that relies on synchronous timing into an algorithm that tolerates asynchronous executions. We introduce a transformation technique from synchronous algorithms to indulgent algorithms (Guerraoui, in PODC, pp. 289-297, 2000), which induces only a constant overhead in terms of time complexity in well-behaved executions. Our technique is based on a new abstraction we call an asynchrony detector, which the participating processes implement collectively. The resulting transformation works for the class of colorless distributed tasks, including consensus and set agreement. Interestingly, we also show that our technique is relevant for colored tasks, by applying it to the renaming problem, to obtain the first indulgent renaming algorith

RERO DOC Digital Library