6 research outputs found

    Contention-aware metrics: analysis of distributed algorithms

    Get PDF
    Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algorithms that account for network contention as well as CPU contention. We then illustrate the use of these metrics by comparing four Atomic Broadcast algorithms, and show that our metrics allow for a deeper understanding of performance issues than conventional metrics

    Multicast Services for Multimedia Collaborative Applications

    Get PDF
    This work aims at providing multicast services for multimedia collaborative applications over large inter-networks such as the Internet. Multimedia collaborative applications are typically of small group size, slow group membership dynamics, and awareness of participants\u27 identities and locations. Moreover, they usually consist of several components such as audio, video, shared whiteboard, and single user application sharing engines that collectively help make the collaboration session successful. Each of these components has its demands from the communication layer that may differ from one component to another. This dissertation identifies the overall characteristics of multimedia collaborative applications and their individual components. It also determines the service requirements of the various components from the communication layer. Based on the analysis done in the thesis, new techniques of multicast services that are more suitable for multimedia collaborative applications are introduced. In particular, the focus will be on multicast address management and connection control, routing, congestion and flow control, and error control. First, we investigate multicast address management and connection control and provide a new technique for address management based on address space partitioning. Second, we study the problem of multicast routing and introduce a new approach that fits the real time nature of multimedia applications. Third, we explore the problem of congestion and flow control and introduce a new mechanism that takes into consideration the heterogeneity within the network and within the processing capabilities of the end systems. Last, we exploit the problem of error control and present a solution that supports various levels of error control to the different components within the collaboration session. We present analytic as well as simulation studies to evaluate our work, which show that our techniques outperform previous ones

    Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

    Get PDF
    Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server
    corecore