2,793 research outputs found

    The impossibility of boosting distributed service resilience

    Get PDF
    We study f -resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f -resilient atomic objects and reliable registers can solve the consensus problem in the presence of f + 1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set consensus problem is solvable for 2n processes and 2n − 1 failures (i.e., wait-free) using n-process consensus services resilient to n − 1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure de- tectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f +1 process failures overall can disable all the failure- aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects

    ACE: Abstract Consensus Encapsulation for Liveness Boosting of State Machine Replication

    Get PDF
    With the emergence of attack-prone cross-organization systems, providing asynchronous state machine replication (SMR) solutions is no longer a theoretical concern. This paper presents ACE, a framework for the design of such fault tolerant systems. Leveraging a known paradigm for randomized consensus solutions, ACE wraps existing practical solutions and real-life systems, boosting their liveness under adversarial conditions and, at the same time, promoting load balancing and fairness. Boosting is achieved without modifying the overall design or the engineering of these solutions. ACE is aimed at boosting the prevailing approach for practical fault tolerance. This approach, often named partial synchrony, is based on a leader-based paradigm: a good leader makes progress and a bad leader does no harm. The partial synchrony approach focuses on safety and forgoes liveness under targeted and dynamic attacks. Specifically, an attacker might block specific leaders, e.g., through a denial of service, to prevent progress. ACE provides boosting by running waves of parallel leaders and selecting a winning leader only retroactively, achieving boosting at a linear communication cost increase. ACE is agnostic to the fault model, inheriting it s failure model from the wrapped solution assumptions. As our evaluation shows, an asynchronous Byzantine fault tolerance (BFT) replication system built with ACE around an existing partially synchronous BFT protocol demonstrates reasonable slow-down compared with the base BFT protocol during faultless synchronous scenarios, yet exhibits significant speedup while the system is under attack

    Failure detectors as type boosters

    Get PDF
    The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to "boost” the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100-109, 1995) that a certain failure detector, denoted Ω n , is sufficient to boost the power of a type T from n to k, and it was conjectured that Ω n was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) ≤ n, Ω n is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ω n is also the weakest to boost the power of (n+1)-ported one-shot deterministic types from n to any k > n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra etal. in J ACM 43(4):685-722, 1996). As a corollary, we show that Ω t is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t − 1)-resilient objects of consensus power

    Efficient Counting with Optimal Resilience

    No full text
    In the synchronous cc-counting problem, we are given a synchronous system of nn nodes, where up to ff of the nodes may be Byzantine, that is, have arbitrary faulty behaviour. The task is to have all of the correct nodes count modulo cc in unison in a self-stabilising manner: regardless of the initial state of the system and the faulty nodes' behavior, eventually rounds are consistently labelled by a counter modulo cc at all correct nodes. We provide a deterministic solution with resilience f<n/3f<n/3 that stabilises in O(f)O(f) rounds and every correct node broadcasts O(log2f)O(\log^2 f) bits per round. We build and improve on a recent result offering stabilisation time O(f)O(f) and communication complexity O(log2f/loglogf)O(\log^2 f /\log \log f) but with sub-optimal resilience f=n1o(1)f = n^{1-o(1)} (PODC 2015). Our new algorithm has optimal resilience, asymptotically optimal stabilisation time, and low communication complexity. Finally, we modify the algorithm to guarantee that after stabilisation very little communication occurs. In particular, for optimal resilience and polynomial counter size c=nO(1)c=n^{O(1)}, the algorithm broadcasts only O(1)O(1) bits per node every Θ(n)\Theta(n) rounds without affecting the other properties of the algorithm; communication-wise this is asymptotically optimal

    Synchronization using failure detectors

    Get PDF
    Many important synchronization problems in distributed computing are impossible to solve (in a fault-tolerant manner) in purely asynchronous systems, where message transmission delays and relative processor speeds are unbounded. It is then natural to seek for the minimal synchrony assumptions that are sufficient to solve a given synchronization problem. A convenient way to describe synchrony assumptions is using the failure detector abstraction. In this thesis, we determine the weakest failure detectors for several fundamental problems in distributed computing: solving fault-tolerant mutual exclusion, solving non-blocking atomic commit, and boosting the synchronization power of atomic objects. We conclude the thesis by a perspective on the very definition of failure detectors

    Distributed eventual leader election in the crash-recovery and general omission failure models.

    Get PDF
    102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

    Education for innovation and entrepreneurship in the food system: the Erasmus+ BoostEdu approach and results

    Get PDF
    Innovation and entrepreneurship are key factors to provide added value for food systems. Based on the findings of the Erasmus+ Strategic Partnership BoostEdu, the objective of this paper is to provide answers to three knowledge gaps: 1) identify the needs for innovation and entrepreneurship (I&amp;E) in the food sector; 2) understand the best way to organize learning; 3) provide flexibility in turbulent times. BoostEdu aimed to provide a platform for continuing education within I&amp;E for food professionals and was carried out through co-creation workshops and the development of an e-learning course. The results of the project in particular during the Covid-19 pandemics, highlighted the need for flexible access to modules that are complementary to other sources and based on a mix of theoretical concepts and practical experiences. The main lessons learned concern the need of co-creation and co-learning processes to identify suitable practices for the use of innovative digital technologies

    Automatic Machine Learning for Insurance: H2O Experiment

    Full text link
    Treballs Finals del Màster de Ciències Actuarials i Financeres, Facultat d'Economia i Empresa, Universitat de Barcelona, Curs: 2020-2021, Tutor: Dr. Salvador Torra PorrasThis thesis provides an introduction of machine learning (ML), shows the implication that ML has on the insurance sector and takes a special consideration to the H2O ensemble modelling approach for the insurance claim fraud detection binary classification. The aim of this thesis is to study the H2O Automatic ML potential and compare the results generated with traditional algorithms such as lineal perceptron, Logistic Regression, multilayer perceptron, support vector machine and decision tree. Using H2O web interface or R programming, not only the most efficient ML algorithms are obtained with no effort but also provide better modelling metrics than traditional methods
    corecore