Search CORE

5 research outputs found

Contention-aware metrics: analysis of distributed algorithms

Author: Défago Xavier
Schiper André
Urbán Péter
Publication venue
Publication date: 13/07/2005
Field of study

Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algorithms that account for network contention as well as CPU contention. We then illustrate the use of these metrics by comparing four Atomic Broadcast algorithms, and show that our metrics allow for a deeper understanding of performance issues than conventional metrics

Infoscience - École polytechnique fédérale de Lausanne

A comparative study of three token ring protocols for real time communications

Author: Lim C. C.
Yao L.
Zhao W.
Publication venue: International Society for Mini and Microcomputers
Publication date: 01/01/1995
Field of study

Adelaide Research & Scholarship

General schedulability bound analysis and its applications in real-time systems

Author: Wu Jianjia
Publication venue: Texas A&M University
Publication date: 17/09/2007
Field of study

Real-time system refers to the computing, communication, and information system with deadline requirements. To meet these deadline requirements, most systems use a mechanism known as the schedulability test which determines whether each of the admitted tasks can meet its deadline. A new task will not be admitted unless it passes the schedulability test. Schedulability tests can be either direct or indirect. The utilization based schedulability test is the most common schedulability test approach, in which a task can be admitted only if the total system utilization is lower than a pre-derived bound. While the utilization bound based schedulability test is simple and effective, it is often difficult to derive the bound. For its analytical complexity, utilization bound results are usually obtained on a case-by-case basis. In this dissertation, we develop a general framework that allows effective derivation of schedulability bounds for different workload patterns and schedulers. We introduce an analytical model that is capable of describing a wide range of tasks' and schedulers'ÃÂÃÂ behaviors. We propose a new definition of utilization, called workload rate. While similar to utilization, workload rate enables flexible representation of different scheduling and workload scenarios and leads to uniform proof of schedulability bounds. We introduce two types of workload constraint functions, s-shaped and r-shaped, for flexible and accurate characterization of the task workloads. We derive parameterized schedulability bounds for arbitrary static priority schedulers, weighted round robin schedulers, and timed token ring schedulers. Existing utilization bounds for these schedulers are obtained from the closed-form formula by direct assignment of proper parameters. Some of these results are applied to a cluster computing environment. The results developed in this dissertation will help future schedulability bound analysis by supplying a unified modeling framework and will ease the implementation practical real-time systems by providing a set of ready to use bound results

Texas A&M Repository

Agreement-related problems:from semi-passive replication to totally ordered broadcast

Author: Défago Xavier
Publication venue: Lausanne, EPFL
Publication date: 16/03/2005
Field of study

Agreement problems constitute a fundamental class of problems in the context of distributed systems. All agreement problems follow a common pattern: all processes must agree on some common decision, the nature of which depends on the specific problem. This dissertation mainly focuses on three important agreements problems: Replication, Total Order Broadcast, and Consensus. Replication is a common means to introduce redundancy in a system, in order to improve its availability. A replicated server is a server that is composed of multiple copies so that, if one copy fails, the other copies can still provide the service. Each copy of the server is called a replica. The replicas must all evolve in manner that is consistent with the other replicas. Hence, updating the replicated server requires that every replica agrees on the set of modifications to carry over. There are two principal replication schemes to ensure this consistency: active replication and passive replication. In Total Order Broadcast, processes broadcast messages to all processes. However, all messages must be delivered in the same order. Also, if one process delivers a message m, then all correct processes must eventually deliver m. The problem of Consensus gives an abstraction to most other agreement problems. All processes initiate a Consensus by proposing a value. Then, all processes must eventually decide the same value v that must be one of the proposed values. These agreement problems are closely related to each other. For instance, Chandra and Toueg [CT96] show that Total Order Broadcast and Consensus are equivalent problems. In addition, Lamport [Lam78] and Schneider [Sch90] show that active replication needs Total Order Broadcast. As a result, active replication is also closely related to the Consensus problem. The first contribution of this dissertation is the definition of the semi-passive replication technique. Semi-passive replication is a passive replication scheme based on a variant of Consensus (called Lazy Consensus and also defined here). From a conceptual point of view, the result is important as it helps to clarify the relation between passive replication and the Consensus problem. In practice, this makes it possible to design systems that react more quickly to failures. The problem of Total Order Broadcast is well-known in the field of distributed systems and algorithms. In fact, there have been already more than fifty algorithms published on the problem so far. Although quite similar, it is difficult to compare these algorithms as they often differ with respect to their actual properties, assumptions, and objectives. The second main contribution of this dissertation is to define five classes of total order broadcast algorithms, and to relate existing algorithms to those classes. The third contribution of this dissertation is to compare the expected performance of the various classes of total order broadcast algorithms. To achieve this goal, we define a set of metrics to predict the performance of distributed algorithms

Infoscience - École polytechnique fédérale de Lausanne

Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

Author: Urbán Péter
Publication venue: Lausanne, EPFL
Publication date: 16/03/2005
Field of study

Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server

Infoscience - École polytechnique fédérale de Lausanne