6 research outputs found

    A Design and Implementation of Cluster Heartbeat Network for Efficient Fault Detection

    Get PDF
    To achieve fault tolerance in a server cluster, fault detection capability is a primary prerequisite. Efficient fault detection is prompt, correct and complete. This paper revisited the technique called Reactive Failure Detection (RFD) that dynamically predicts a heartbeat delay from a cluster node. We also identified the requirements to deploy RFD in actual servers. A new cluster heartbeat network with concurrency is proposed to use push and pull interaction during live monitoring and determining node’s status. The prototype of the new model is tested on a platform running multiple independent web applications and analyzed for its implementation and design correctness

    A QoS-configurable failure detection service for internet applications

    Get PDF
    International audienceUnreliable failure detectors are a basic building block of reliable distributed systems. Failure detectors are used to monitor processes of any application and provide process state information. This work presents an Internet Failure Detector Service (IFDS) for processes running in the Internet on multiple autonomous systems. The failure detection service is adaptive, and can be easily integrated into applications that require configurable QoS guarantees. The service is based on monitors which are capable of providing global process state information through a SNMP MIB. Monitors at different networks communicate across the Internet using Web Services. The system was implemented and evaluated for monitored processes running both on single LAN and on PlanetLab. Experimental results are presented, showing the performance of the detector, in particular the advantages of using the self-tuning strategies to address the requirements of multiple concurrent applications running on a dynamic environment

    Revised reference model

    Get PDF
    This document contains an update of the HIDENETS Reference Model, whose preliminary version was introduced in D1.1. The Reference Model contains the overall approach to development and assessment of end-to-end resilience solutions. As such, it presents a framework, which due to its abstraction level is not only restricted to the HIDENETS car-to-car and car-to-infrastructure applications and use-cases. Starting from a condensed summary of the used dependability terminology, the network architecture containing the ad hoc and infrastructure domain and the definition of the main networking elements together with the software architecture of the mobile nodes is presented. The concept of architectural hybridization and its inclusion in HIDENETS-like dependability solutions is described subsequently. A set of communication and middleware level services following the architecture hybridization concept and motivated by the dependability and resilience challenges raised by HIDENETS-like scenarios is then described. Besides architecture solutions, the reference model addresses the assessment of dependability solutions in HIDENETS-like scenarios using quantitative evaluations, realized by a combination of top-down and bottom-up modelling, as well as verification via test scenarios. In order to allow for fault prevention in the software development phase of HIDENETS-like applications, generic UML-based modelling approaches with focus on dependability related aspects are described. The HIDENETS reference model provides the framework in which the detailed solution in the HIDENETS project are being developed, while at the same time facilitating the same task for non-vehicular scenarios and application

    Support for dependable and adaptive distributed systems and applications

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2011Distributed applications executing in uncertain environments, like the Internet, need to make timing/synchrony assumptions (for instance, about the maximum message transmission delay), in order to make progress. In the case of adaptive systems these temporal bounds should be computed at runtime, using probabilistic or specifically designed ad hoc approaches, typically with the objective of improving the application performance. From a dependability perspective, however, the concern is to secure some properties on which the application can rely. This thesis addresses the problem of supporting adaptive systems and applications in stochastic environments, from a dependability perspective: maintaining the correctness of system properties after adaptation. The idea behind dependable adaptation consists in ensuring that the assumed bounds for fundamental variables (e.g., network delays) are secured with a known and constant probability. Assuming that during its lifetime a system alternates periods where its temporal behavior is well characterized (stable phases), with transition periods where a variation of the network conditions occurs (transient phases), the proposed approach is based on the following: if the environment is generically characterized in analytical terms and it is possible to detect the alternation of these stable and transient phases, then it is possible to effectively and dependably adapt applications. Based on this idea, the thesis introduces Adaptare, a framework for supporting dependable adaptation in stochastic environments. An extensive evaluation of Adaptare is provided, assessing the correctness and effectiveness of the implemented mechanisms. The results indicate that the proposed strategies and methodologies are indeed effective to support dependable adaptation of distributed systems and applications. Finally, the applicability of Adaptare is evaluated in the context of two fundamental problems in distributed systems: consensus and failure detection. The thesis proposes solutions for these problems based on modular architectures in which Adaptare is used as a middleware for dependable adaptation of assumed timeouts.Aplicações distribuídas que executam em ambientes incertos, como a Internet, baseiam-se em pressupostos sobre tempo/sincronia (por exemplo, assumem um tempo máximo para a transmissão de mensagens) a fim de assegurar progresso. No caso de sistemas adaptativos, esses limites temporais devem ser calculados em tempo de execução, usando abordagens probabilísticas ou desenhadas de forma específica e ad hoc, tipicamente visando melhorar o desempenho da aplicação. Sob o ponto de vista da confiabilidade, no entanto, o objetivo é garantir algumas propriedades nas quais a aplicação pode confiar. Esta tese aborda o problema de suportar sistemas adaptativos e aplicações que operam em ambientes estocásticos, numa perspectiva de confiabilidade: mantendo a correção das propriedades do sistema após a adaptação. A ideia da adaptação confiável consiste em garantir que os limites assumidos para variáveis fundamentais (por exemplo, latências de transmissão) são assegurados com uma probabilidade conhecida e constante. Supondo que durante a execução o sistema alterna períodos nos quais o seu comportamento temporal é bem caracterizado (fases estáveis), com períodos de transição durante os quais ocorrem variações das condições da rede (fases transientes), a abordagem proposta baseia-se no seguinte: se o ambiente é genericamente caracterizado em termos analíticos e é possível detetar a alternância entre fases estáveis e transientes, então é possível adaptar as aplicações de forma efetiva e confiável. Com base nesta ideia, a tese apresenta uma plataforma para suportar a adaptação confiável em ambientes estocásticos, denominada Adaptare. A tese contém uma extensa avaliação do Adaptare, que foi realizada para verificar a correção e eficácia dos mecanismos desenvolvidos. Os resultados indicam que as estratégias e metodologias propostas são de facto efetivas para suportar a adaptação confiável de sistemas e aplicações distribuídas. Finalmente, a aplicabilidade do Adaptare é avaliada no contexto de dois problemas fundamentais em sistemas distribuídos: consenso e deteção de falhas. A tese propõe soluções para estes problemas baseadas em arquiteturas modulares nas quais o Adaptare é usado como um middleware para a adaptação confiável de timeouts.Fundação para a Ciência e a Tecnologia (FCT
    corecore