26 research outputs found

    End-To-End Latency of a Fault-Tolerant CORBA Infrastructure

    Get PDF
    This paper presents an evaluation of the end-to-end latency of a fault-tolerant CORBA infrastructure that we have implemented. The fault-tolerant infrastructure replicates the server applications using active, passive and semi-active replication, and maintains strong replica consistency of the server replicas. By analyses and by measurements of the running fault-tolerant infrastructure, we characterize the end-to-end latency under fault-free conditions. The main determining factor of the run-time performance of the fault-tolerant infrastructure is the Totem group communication protocol, which contributes to the end-to-end latency primarily in two ways: the delay in sending messages and the processing cost of the rotating token. To reduce the delay in sending messages for passive and semi-active replication, the position of the primary server replica on the Totem ring, the token rotation time, the processing time at the client, and the processing time at the server must be considered. For active replication, the presence of duplicate messages adversely affects the performance. However, if an effective sending-side duplicate suppression mechanism is implemented, active replication is more advantageous than both passive and semi-active replication because of the automatic selection of the most favorable position of the server replica that sends the first non-duplicate reply

    Scalability, Throughput Stability and Efficient Buffering in ReliableMulticast Protocols

    Full text link
    This study investigates the issues of scalability, throughput stability and efficient buffering in reliable multicast protocols. The focus is on a new class of scalable reliable multicast protoco, PBcast that is based on an epidemic loss recovery mechanism. The protocol offers scalability, throughput stability and a bimodal delivery guarantee as the key features. A theoretical analysis study for the protocol is already available

    Agreement-related problems:from semi-passive replication to totally ordered broadcast

    Get PDF
    Agreement problems constitute a fundamental class of problems in the context of distributed systems. All agreement problems follow a common pattern: all processes must agree on some common decision, the nature of which depends on the specific problem. This dissertation mainly focuses on three important agreements problems: Replication, Total Order Broadcast, and Consensus. Replication is a common means to introduce redundancy in a system, in order to improve its availability. A replicated server is a server that is composed of multiple copies so that, if one copy fails, the other copies can still provide the service. Each copy of the server is called a replica. The replicas must all evolve in manner that is consistent with the other replicas. Hence, updating the replicated server requires that every replica agrees on the set of modifications to carry over. There are two principal replication schemes to ensure this consistency: active replication and passive replication. In Total Order Broadcast, processes broadcast messages to all processes. However, all messages must be delivered in the same order. Also, if one process delivers a message m, then all correct processes must eventually deliver m. The problem of Consensus gives an abstraction to most other agreement problems. All processes initiate a Consensus by proposing a value. Then, all processes must eventually decide the same value v that must be one of the proposed values. These agreement problems are closely related to each other. For instance, Chandra and Toueg [CT96] show that Total Order Broadcast and Consensus are equivalent problems. In addition, Lamport [Lam78] and Schneider [Sch90] show that active replication needs Total Order Broadcast. As a result, active replication is also closely related to the Consensus problem. The first contribution of this dissertation is the definition of the semi-passive replication technique. Semi-passive replication is a passive replication scheme based on a variant of Consensus (called Lazy Consensus and also defined here). From a conceptual point of view, the result is important as it helps to clarify the relation between passive replication and the Consensus problem. In practice, this makes it possible to design systems that react more quickly to failures. The problem of Total Order Broadcast is well-known in the field of distributed systems and algorithms. In fact, there have been already more than fifty algorithms published on the problem so far. Although quite similar, it is difficult to compare these algorithms as they often differ with respect to their actual properties, assumptions, and objectives. The second main contribution of this dissertation is to define five classes of total order broadcast algorithms, and to relate existing algorithms to those classes. The third contribution of this dissertation is to compare the expected performance of the various classes of total order broadcast algorithms. To achieve this goal, we define a set of metrics to predict the performance of distributed algorithms

    Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

    Get PDF
    Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server

    Design and implementation of a QoS-Supportive system for reliable multicast

    Get PDF
    As the Internet is increasingly being used by business companies to offer and procure services, providers of networked system services are expected to assure customers of specific Quality of Service (QoS) they could offer. This leads to scenarios where users prefer to negotiate required QoS guarantees prior to accepting a service, and service providers assess their ability to provide the customer with the requested QoS on the basis of existing resource availability. A system to be deployed in such scenarios should, in addition to providing the services, (i) monitor resource availability, (ii) be able to assess whether or not requested QoS can be met, and (iii) adapt to QoS perturbations (e.g., node failures) which undermine any assumptions made on continued resource availability. This thesis focuses on building such a QoS-Supportive system for reliably multicasting messages within a group of crash-prone nodes connected by loss-prone networks. System design involves developing a Reliable Multicast protocol and analytically estimating the multicast performance in terms of protocol parameters. It considers two cases regarding message size: small messages that fit into a single packet and large ones that need to be fragmented into multiple packets. Analytical estimations are obtained through stochastic modelling and approximation, and their accuracy is demonstrated using simulations. They allow the affordability of the requested QoS to be numerically assessed for a given set of performance metrics of the underlying network, and also indicate the values to be used for the protocol parameters if the affordable QoS is to be achieved. System implementation takes a modular approach and the major sub-systems built include: the QoS negotiation component, the network monitoring component and the reliable multicast protocol component. Two prototypes have been built. The first one is built as a middleware system in itself to the extent of testing our ideas over a group of geographically distant nodes using PlanetLab. The second prototype is developed as a part of the JGroups Reliable Communication Toolkit and provides, besides an example of scenario directly benefitting of such technology, an example integration of our subsystem into an already-existing system.EThOS - Electronic Theses Online ServiceTAPAS EU-IST-2001-34069 Project : EPSR (Engineering and Physical Sciences Research Council)GBUnited Kingdo

    New Challenges in Quality of Services Control Architectures in Next Generation Networks

    Get PDF
    A mesura que Internet i les xarxes IP s'han anat integrant dins la societat i les corporacions, han anat creixent les expectatives de nous serveis convergents així com les expectatives de qualitat en les comunicacions. Les Next Generation Networks (NGN) donen resposta a les noves necessitats i representen el nou paradigma d'Internet a partir de la convergència IP. Un dels aspectes menys desenvolupats de les NGN és el control de la Qualitat del Servei (QoS), especialment crític en les comunicacions multimèdia a través de xarxes heterogènies i/o de diferents operadors. A més a més, les NGN incorporen nativament el protocol IPv6 que, malgrat les deficiències i esgotament d'adreces IPv4, encara no ha tingut l'impuls definitiu.Aquesta tesi està enfocada des d'un punt de vista pràctic. Així doncs, per tal de poder fer recerca sobre xarxes de proves (o testbeds) que suportin IPv6 amb garanties de funcionament, es fa un estudi en profunditat del protocol IPv6, del seu grau d'implementació i dels tests de conformància i interoperabilitat existents que avaluen la qualitat d'aquestes implementacions. A continuació s'avalua la qualitat de cinc sistemes operatius que suporten IPv6 mitjançant un test de conformància i s'implementa el testbed IPv6 bàsic, a partir del qual es farà la recerca, amb la implementació que ofereix més garanties.El QoS Broker és l'aportació principal d'aquesta tesi: un marc integrat que inclou un sistema automatitzat per gestionar el control de la QoS a través de sistemes multi-domini/multi-operador seguint les recomanacions de les NGN. El sistema automatitza els mecanismes associats a la configuració de la QoS dins d'un mateix domini (sistema autònom) mitjançant la gestió basada en polítiques de QoS i automatitza la negociació dinàmica de QoS entre QoS Brokers de diferents dominis, de forma que permet garantir QoS extrem-extrem sense fissures. Aquesta arquitectura es valida sobre un testbed de proves multi-domini que utilitza el mecanisme DiffServ de QoS i suporta IPv6.L'arquitectura definida en les NGN permet gestionar la QoS tant a nivell 3 (IP) com a nivell 2 (Ethernet, WiFi, etc.) de forma que permet gestionar també xarxes PLC. Aquesta tesi proposa una aproximació teòrica per aplicar aquesta arquitectura de control, mitjançant un QoS Broker, a les noves xarxes PLC que s'estan acabant d'estandarditzar, i discuteix les possibilitats d'aplicació sobre les futures xarxes de comunicació de les Smart Grids.Finalment, s'integra en el QoS Broker un mòdul per gestionar l'enginyeria del tràfic optimitzant els dominis mitjançant tècniques de intel·ligència artificial. La validació en simulacions i sobre un testbed amb routers Cisco demostra que els algorismes genètics híbrids són una opció eficaç en aquest camp.En general, les observacions i avenços assolits en aquesta tesi contribueixen a augmentar la comprensió del funcionament de la QoS en les NGN i a preparar aquests sistemes per afrontar problemes del món real de gran complexitat.A medida que Internet y las redes IP se han ido integrando dentro de la sociedad y las corporaciones, han ido creciendo las expectativas de nuevos servicios convergentes así como las expectativas de calidad en las comunicaciones. Las Next Generation Networks (NGN) dan respuesta a las nuevas necesidades y representan el nuevo paradigma de Internet a partir de la convergencia IP. Uno de los aspectos menos desarrollados de las NGN es el control de la Calidad del Servicio (QoS), especialmente crítico en las comunicaciones multimedia a través de redes heterogéneas y/o de diferentes operadores. Además, las NGN incorporan nativamente el protocolo IPv6 que, a pesar de las deficiencias y agotamiento de direcciones IPv4, aún no ha tenido el impulso definitivo.Esta tesis está enfocada desde un punto de vista práctico. Así pues, con tal de poder hacer investigación sobre redes de prueba (o testbeds) que suporten IPv6 con garantías de funcionamiento, se hace un estudio en profundidad del protocolo IPv6, de su grado de implementación y de los tests de conformancia e interoperabilidad existentes que evalúan la calidad de estas implementaciones. A continuación se evalua la calidad de cinco sistemas operativos que soportan IPv6 mediante un test de conformancia y se implementa el testbed IPv6 básico, a partir del cual se realizará la investigación, con la implementación que ofrece más garantías.El QoS Broker es la aportación principal de esta tesis: un marco integrado que incluye un sistema automatitzado para gestionar el control de la QoS a través de sistemas multi-dominio/multi-operador siguiendo las recomendaciones de las NGN. El sistema automatiza los mecanismos asociados a la configuración de la QoS dentro de un mismo dominio (sistema autónomo) mediante la gestión basada en políticas de QoS y automatiza la negociación dinámica de QoS entre QoS brokers de diferentes dominios, de forma que permite garantizar QoS extremo-extremo sin fisuras. Esta arquitectura se valida sobre un testbed de pruebas multi-dominio que utiliza el mecanismo DiffServ de QoS y soporta IPv6. La arquitectura definida en las NGN permite gestionar la QoS tanto a nivel 3 (IP) o como a nivel 2 (Ethernet, WiFi, etc.) de forma que permite gestionar también redes PLC. Esta tesis propone una aproximación teórica para aplicar esta arquitectura de control, mediante un QoS Broker, a las noves redes PLC que se están acabando de estandardizar, y discute las posibilidades de aplicación sobre las futuras redes de comunicación de las Smart Grids.Finalmente, se integra en el QoS Broker un módulo para gestionar la ingeniería del tráfico optimizando los dominios mediante técnicas de inteligencia artificial. La validación en simulaciones y sobre un testbed con routers Cisco demuestra que los algoritmos genéticos híbridos son una opción eficaz en este campo.En general, las observaciones y avances i avances alcanzados en esta tesis contribuyen a augmentar la comprensión del funcionamiento de la QoS en las NGN y en preparar estos sistemas para afrontar problemas del mundo real de gran complejidad.The steady growth of Internet along with the IP networks and their integration into society and corporations has brought with it increased expectations of new converged services as well as greater demands on quality in communications. The Next Generation Networks (NGNs) respond to these new needs and represent the new Internet paradigm from the IP convergence. One of the least developed aspects in the NGNs is the Quality of Service (QoS) control, which is especially critical in the multimedia communication through heterogeneous networks and/or different operators. Furthermore, the NGNs natively incorporate the IPv6 protocol which, despite its shortcomings and the depletion of IPv4 addresses has not been boosted yet.This thesis has been developed with a practical focus. Therefore, with the aim of carrying out research over testbeds supporting the IPv6 with performance guarantees, an in-depth study of the IPv6 protocol development has been conducted and its degree of implementation and the existing conformance and interoperability tests that evaluate these implementations have been studied. Next, the quality of five implementations has been evaluated through a conformance test and the basic IPv6 testbed has been implemented, from which the research will be carried out. The QoS Broker is the main contribution to this thesis: an integrated framework including an automated system for QoS control management through multi-domain/multi-operator systems according to NGN recommendations. The system automates the mechanisms associated to the QoS configuration inside the same domain (autonomous system) through policy-based management and automates the QoS dynamic negotiation between peer QoS Brokers belonging to different domains, so it allows the guarantee of seamless end-to-end QoS. This architecture is validated over a multi-domain testbed which uses the QoS DiffServ mechanism and supports IPv6.The architecture defined in the NGN allows QoS management at level 3 (IP) as well as at level 2 (e.g. Ethernet, WiFi) so it also facilitates the management of PLC networks. Through the use of a QoS Broker, this thesis proposes a theoretical approach for applying this control architecture to the newly standardized PLC networks, and discusses the possibilities of applying it over the future communication networks of the Smart Grids.Finally, a module for managing traffic engineering which optimizes the network domains through artificial intelligence techniques is integrated in the QoS Broker. The validations by simulations and over a Cisco router testbed demonstrate that hybrid genetic algorithms are an effective option in this area.Overall, the advances and key insights provided in this thesis help advance our understanding of QoS functioning in the NGNs and prepare these systems to face increasingly complex problems, which abound in current industrial and scientific applications

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF
    corecore