28 research outputs found

    Failure Detectors for Wireless Sensor-Actuator Systems

    Get PDF
    Wireless sensor-actuator systems (WSAS) offer exciting opportunities for emerging applications by facilitating fine-grained monitoring and control, and dense instrumentation. The large scale of such systems increases the need for such systems to tolerate and cope with failures, in a localized and decentralized manner. We present abstractions for detecting node failures and link failures caused by topology changes in a WSAS. These abstractions were designed and implemented as a set of reusable components in nesC under TinyOS. Results, which demonstrate the performance and viability of the abstractions, based on experiments on an 80 node testbed are presented. In the future, these abstractions can be extended to detect and cope with larger classes of failures in WSAS

    Failure Detectors for Wireless Sensor-Actuator Systems

    Get PDF
    Wireless sensor-actuator systems (WSAS) offer exciting opportunities for emerging applications by facilitating fine-grained monitoring and control, and dense instrumentation. The large scale of such systems increases the need for such systems to tolerate and cope with failures, in a localized and decentralized manner. We present abstractions for detecting node failures and link failures caused by topology changes in a WSAS. These abstractions were designed and implemented as a set of reusable components in nesC under TinyOS. Results, which demonstrate the performance and viability of the abstractions, based on experiments on an 80 node testbed are presented. In the future, these abstractions can be extended to detect and cope with larger classes of failures in WSAS

    Solution to Atomic Commitment Problem Based on Heartbeat Failure Detector

    Get PDF
    心跳故障检测器可以用来求解包含进程故障和链路故障的异步消息传递系统中的静止可靠通信问题。不像通常采用超时技术的传统故障检测器,心跳故障检测器不使用超时技术,而采用计数器。原子提交问题要求对某个事务,所有参与进程都有共同的结果,即或者全部输出提交,或者全部输出中止。考虑实际应用中通常采用的无阻塞原子提交问题,它要求即使某些进程发生故障,正确进程仍能判定共同的结果。然而R.Guerraoui已经证明,在故障发生的情况下,使用不可靠的故障检测器无法解决无阻塞原子提交问题。原因是无阻塞原子提交问题的非平凡性要求进程必须知道关于故障的精确信息,而不可靠的故障检测器显然无法给相应进程提供这样的信息,因为不...Heartbeat failure detector (HB) can be used to solve quiescent reliable communication problems in asynchronous message-passing systems with process and link failures. Unlike traditional failure detectors which use timeouts, HB does not use timeouts, but use counters. Atomic Commitment (AC) problem requires the participants to agree on an outcome for the transaction: commit or abort. In real-wor...学位:工学硕士院系专业:软件学院_计算机应用技术学号:20034000

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    The Weakest Failure Detector for Solving Wait-Free, Eventually Bounded-Fair Dining Philosophers

    Get PDF
    This dissertation explores the necessary and sufficient conditions to solve a variant of the dining philosophers problem. This dining variant is defined by three properties: wait-freedom, eventual weak exclusion, and eventual bounded fairness. Wait-freedom guarantees that every correct hungry process eventually enters its critical section, regardless of process crashes. Eventual weak exclusion guarantees that every execution has an infinite suffix during which no two live neighbors execute overlapping critical sections. Eventual bounded fairness guarantees that there exists a fairness bound k such that every execution has an infinite suffix during which no correct hungry process is overtaken more than k times by any neighbor. This dining variant (WF-EBF dining for short) is important for synchronization tasks where eventual safety (i.e., eventual weak exclusion) is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, it is known that wait-free dining is unsolvable in asynchronous message-passing systems subject to crash faults. To circumvent this impossibility result, it is necessary to assume the existence of bounds on timing properties, such as relative process speeds and message delivery time. As such, it is of interest to characterize the necessary and sufficient timing assumptions to solve WF-EBF dining. We focus on implicit timing assumptions, which can be encapsulated by failure detectors. Failure detectors can be viewed as distributed oracles that can be queried for potentially unreliable information about crash faults. The weakest detector D for WF-EBF dining means that D is both necessary and sufficient. Necessity means that every failure detector that solves WF-EBF dining is at least as strong as D. Sufficiency means that there exists at least one algorithm that solves WF-EBF dining using D. As such, our research goal is to characterize the weakest failure detector to solve WF-EBF dining. We prove that the eventually perfect failure detector 3P is the weakest failure detector for solving WF-EBF dining. 3P eventually suspects crashed processes permanently, but may make mistakes by wrongfully suspecting correct processes finitely many times during any execution. As such, 3P eventually stops suspecting correct processes

    Fault-tolerant computing with unreliable channels

    Full text link
    We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels

    Specification of Replication Techniques, Semi-Passive Replication, and Lazy consensus*

    Get PDF
    This paper brings the following three main contributions: a hierarchy of specifications for replication techniques, semi-passive replication, and Lazy Consensus. Based on the definition of the Generic Replication problem, we difine two families of replication techniques: replication with parsimonious processing (e.g., passive replication), and replication with redundant processing (e.g., active replication). This helps relate replication techniques to each other. We define a novel replication technique with parsimonious processing, called semi-passive replication, for which we also give an algorithm. The most significant aspect of semi-passive replication is that it requires a weaker system model than existing techniques of the same family. We difine a variant of the Consensus problem, called Lazy Consensus, upon which our semi-passive replication algorithm is based. The main difference between Consensus and Lazy Consensus is a property of laziness which requires that initial values are computed only when they are actually needed

    Gestion de groupe partitionnable dans les réseaux mobiles spontanés

    Get PDF
    Dans les réseaux mobiles spontanés (en anglais, Mobile Ad hoc NETworks ou MANETs), la gestion de groupe partitionnable est un service de base permettant la construction d'applications réparties tolérantes au partitionnement. Aucune des spécifications existantes ne satisfait les deux exigences antagonistes suivantes : 1) elle doit être assez forte pour fournir des garanties utiles aux applications réparties dans les systèmes partitionnables ; 2) elle doit être assez faible pour être résoluble. Dans cette thèse, nous proposons une solution à la gestion de groupe partitionnable en environnements réseaux très dynamiques tels que les MANETs. Pour mettre en œuvre notre solution, nous procédons en trois étapes. Tout d'abord, nous proposons un modèle de système réparti dynamique qui caractérise la stabilité dans les MANETs. Ensuite, nous adaptons pour les systèmes partitionnables l'approche Paxos à base de consensus Synod. Cette adaptation résulte en la spécification d'un consensus abandonnable AC construit au-dessus d'un détecteur ultime des a participants d'une partition PPD et d'un registre ultime par partition RPP. PPD garantit la vivacité dans une partition même si la partition n'est pas complètement stable tandis que RPP préserve la sûreté dans la même partition. Enfin, la gestion de groupe partitionnable est résolue en la transformant en une séquence d'instances de AC. Chacun des modules PPD, RPP, AC et gestion de groupe partitionnable est implanté et prouvé. Par ailleurs, nous analysons les performances de PPD par simulationIn Mobile Ad hoc NETworks or MANETs, partitionable group membership is a basic service for building partition-tolerant applications. None of the existing specifications satisfy the two following antagonistic requirements: 1) it must be strong enough to simplify the design of partition-tolerant distributed applications in partitionable systems; 2) it must be weak enough to be implantable. In this thesis, we propose a solution to partitionable group membership in very dynamic network environment such as MANETs. To this means, we proceed in three steps. First, we develop a dynamic distributed system model that characterises stability in MANETs. Then, we propose a solution to the problem of partitionable group membership by adapting Paxos for such systems. This adatation results in a specification of abortable consensus AC which is composed of an eventual a partition-participants detector PPD and an eventual register per partition RPP. PPD guarantees liveness in a partition even if the partition is not completely stable whereas RPP ensures safety in the same partition. Finally, partitionable group membership is solved by transforming it into a sequence of abortable consensus instances AC. Each of the modules PPD, RPP, AC, and partitionable group membership is implanted and proved. Next, we analyse the performances of PPD by simulationEVRY-INT (912282302) / SudocSudocFranceF

    The Failure Detector Abstraction

    Get PDF
    A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions
    corecore