1,004 research outputs found

    Modeling and validating the performance of atomic broadcast algorithms in high-latency networks

    Get PDF
    The performance of consensus and atomic broadcast algorithms using failure detectors is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. In this paper, we model the performance of three consensus and atomic broadcast algorithms using failure detectors in the oft-neglected setting of wide area networks and validate this model by experimentally evaluating the algorithms in several different setups

    06371 Abstracts Collection -- From Security to Dependability

    Get PDF
    From 10.09.06 to 15.09.06, the Dagstuhl Seminar 06371 ``From Security to Dependability\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Factors that Impact Blockchain Scalability

    Get PDF

    Atomic broadcast:a fault-tolerant token based algorithm and performance evaluations

    Get PDF
    Within only a couple of generations, the so-called digital revolution has taken the world by storm: today, almost all human beings interact, directly or indirectly, at some point in their life, with a computer system. Computers are present on our desks, computer systems control the antilock braking system and the stability control in cars, they collect usage statistics in elevators in order to anticipate maintenance and repair operations. Computer systems also operate critical systems, such as nuclear power plants, airplane control systems or space rockets. Furthermore, computer systems are not only omnipresent, but also increasingly networked. As the use of computer systems has increased dramatically over the past decades, the needs and expectations associated with these systems have also increased. In particular, one of the critical points of a system is its availability (the fraction of the time during which the system provides a service to the users): the costs and negative publicity of a system outage (be it a commercial web site or a stock exchange for example) are often considerable. Fault tolerance is one of the approaches to designing a highly-available system: a fault tolerant system is designed in such a way that the failure of one of the components of the system does not compromise the functionality of the system as a whole. Replication is one of the common fault tolerance techniques. Instead of having a single machine (a replica) providing a service, the system is composed of several replicas running the service and connected through a network. If one of the replicas fails, the service is still provided by the remaining replicas. The replication technique is interesting as it can be achieved by using software running on commodity hardware, thus avoiding the high cost of special purpose hardware. Replication, although intuitive to understand, is complex to implement in practice, as the replicas have to interact in order to ensure the consistency of the system as a whole. Group communication simplifies the replication problem, by hiding issues such as the communication between the replicas, the crashes of one or several replicas and the synchronization of the replicas. In this thesis, we start by comparing two replication techniques – group communication and quorum systems – and identifying in which case either technique should be used. Atomic broadcast (a group communication primitive at the heart of this work) allows replicas to broadcast messages to each other and then deliver them in the same total order, even if replicas broadcast messages quasi simultaneously. Atomic broadcast is especially useful for replication: since all replicas deliver messages in the same order, their state is kept consistent. After the comparison between the replication techniques, we present an atomic broadcast algorithm designed to perform well when the system is heavily loaded and that allows to quickly detect crashed replicas (by minimizing the consequences of wrongly suspecting a non-crashed replica). The presentation of the algorithm includes simulation results comparing the performance of the new algorithm to previously proposed atomic broadcast algorithms. The second part of the thesis focuses on the experimental performance evaluation of the new algorithm in several settings. We start by comparing four atomic broadcast algorithms in a local area network. We then compare three of the four algorithms in a wide area network, with sites in Switzerland, Japan and France, and where the round trip time between the sites varies between 4 and 300 ms. Finally, we evaluate the impact of the size of the system (the number if replicas) on the performance of the algorithms

    Dynamic re-optimization techniques for stream processing engines and object stores

    Get PDF
    Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures

    Rethinking Consistency Management in Real-time Collaborative Editing Systems

    Get PDF
    Networked computer systems offer much to support collaborative editing of shared documents among users. Increasing concurrent access to shared documents by allowing multiple users to contribute to and/or track changes to these shared documents is the goal of real-time collaborative editing systems (RTCES); yet concurrent access is either limited in existing systems that employ exclusive locking or concurrency control algorithms such as operational transformation (OT) may be employed to enable concurrent access. Unfortunately, such OT based schemes are costly with respect to communication and computation. Further, existing systems are often specialized in their functionality and require users to adopt new, unfamiliar software to enable collaboration. This research discusses our work in improving consistency management in RTCES. We have developed a set of deadlock-free multi-granular dynamic locking algorithms and data structures that maximize concurrent access to shared documents while minimizing communication cost. These algorithms provide a high level of service for concurrent access to the shared document and integrate merge-based or OT-based consistency maintenance policies locally among a subset of the users within a subsection of the document – thus reducing the communication costs in maintaining consistency. Additionally, we have developed client-server and P2P implementations of our hierarchical document management algorithms. Simulations results indicate that our approach achieves significant communication and computation cost savings. We have also developed a hierarchical reduction algorithm that can minimize the space required of RTCES, and this algorithm may be pipelined through our document tree. Further, we have developed an architecture that allows for a heterogeneous set of client editing software to connect with a heterogeneous set of server document repositories via Web services. This architecture supports our algorithms and does not require client or server technologies to be modified – thus it is able to accommodate existing, favored editing and repository tools. Finally, we have developed a prototype benchmark system of our architecture that is responsive to users’ actions and minimizes communication costs

    Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

    Get PDF
    Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server
    • …
    corecore