573 research outputs found

    Run-time fault detection in monitor based concurrent programming

    Get PDF
    Software Development and Management Lab., Dept. of ComputingRefereed conference paper2001-2002 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    Algorithms For Extracting Timeliness Graphs

    Get PDF
    We consider asynchronous message-passing systems in which some links are timely and processes may crash. Each run defines a timeliness graph among correct processes: (p; q) is an edge of the timeliness graph if the link from p to q is timely (that is, there is bound on communication delays from p to q). The main goal of this paper is to approximate this timeliness graph by graphs having some properties (such as being trees, rings, ...). Given a family S of graphs, for runs such that the timeliness graph contains at least one graph in S then using an extraction algorithm, each correct process has to converge to the same graph in S that is, in a precise sense, an approximation of the timeliness graph of the run. For example, if the timeliness graph contains a ring, then using an extraction algorithm, all correct processes eventually converge to the same ring and in this ring all nodes will be correct processes and all links will be timely. We first present a general extraction algorithm and then a more specific extraction algorithm that is communication efficient (i.e., eventually all the messages of the extraction algorithm use only links of the extracted graph)

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    FAULT-TOLERANT DISTRIBUTED CHANNEL ALLOCATION ALGORITHMS FOR CELLULAR NETWORKS

    Get PDF
    In cellular networks, channels should be allocated efficiently to support communication betweenmobile hosts. In addition, in cellular networks, base stations may fail. Therefore, designing a faulttolerantchannel allocation algorithm is important. That is, the algorithm should tolerate failuresof base stations. Many existing algorithms are neither fault-tolerant nor efficient in allocatingchannels.We propose channel allocation algorithms which are both fault-tolerant and efficient. In theproposed algorithms, to borrow a channel, a base station (or a cell) does not need to get channelusage information from all its interference neighbors. This makes the algorithms fault-tolerant,i.e., the algorithms can tolerate base station failures, and perform well in the presence of thesefailures.Channel pre-allocation has effect on the performance of a channel allocation algorithm. Thiseffect has not been studied quantitatively. We propose an adaptive channel allocation algorithmto study this effect. The algorithm allows a subset of channels to be pre-allocated to cells. Performanceevaluation indicates that a channel allocation algorithm benefits from pre-allocating allchannels to cells.Channel selection strategy also inuences the performance of a channel allocation algorithm.Given a set of channels to borrow, how a cell chooses a channel to borrow is called the channelselection problem. When choosing a channel to borrow, many algorithms proposed in the literaturedo not take into account the interference caused by borrowing the channel to the cells which havethe channel allocated to them. However, such interference should be considered; reducing suchinterference helps increase the reuse of the same channel, and hence improving channel utilization.We propose a channel selection algorithm taking such interference into account.Most channel allocation algorithms proposed in the literature are for traditional cellular networkswith static base stations and the neighborhood relationship among the base stations is fixed.Such algorithms are not applicable for cellular networks with mobile base stations. We proposea channel allocation algorithm for cellular networks with mobile base stations. The proposedalgorithm is both fault-tolerant and reuses channels efficiently.KEYWORDS: distributed channel allocation, resource planning, fault-tolerance, cellular networks,3-cell cluster model

    EOS: A project to investigate the design and construction of real-time distributed embedded operating systems

    Get PDF
    The EOS project is investigating the design and construction of a family of real-time distributed embedded operating systems for reliable, distributed aerospace applications. Using the real-time programming techniques developed in co-operation with NASA in earlier research, the project staff is building a kernel for a multiple processor networked system. The first six months of the grant included a study of scheduling in an object-oriented system, the design philosophy of the kernel, and the architectural overview of the operating system. In this report, the operating system and kernel concepts are described. An environment for the experiments has been built and several of the key concepts of the system have been prototyped. The kernel and operating system is intended to support future experimental studies in multiprocessing, load-balancing, routing, software fault-tolerance, distributed data base design, and real-time processing

    A cluster based communication architecture for distributed applications in mobile ad hoc networks

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2006Includes bibliographical references (leaves: 63-69)Text in English; Abstract: Turkish and Englishx, 85 leavesIn this thesis, we aim to design and implement three protocols on a hierarchical architecture to solve the balanced clustering, backbone formation and distributed mutual exclusion problems for mobile ad hoc network(MANET)s. Our ¯rst goal is to cluster the MANET into balanced partitions. Clustering is a widely used approach to ease implemen-tation of various problems such as routing and resource management in MANETs. We propose the Merging Clustering Algorithm(MCA) for clustering in MANETs that merges clusters to form higher level of clusters by increasing their levels. Secondly, we aim to con-struct a directed ring topology across clusterheads which were selected by MCA. Lastly, we implement the distributed mutual exclusion algorithm based on Ricart-Agrawala algo-rithm for MANETs(Mobile RA). Each cluster is represented by a coordinator node on the ring which implements distributed mutual exclusion algorithm on behalf of any member in the cluster it represents. We show the operations of the algorithms, analyze their time and message complexities and provide results in the simulation environment of ns2

    The Weakest Failure Detector for Solving Wait-Free, Eventually Bounded-Fair Dining Philosophers

    Get PDF
    This dissertation explores the necessary and sufficient conditions to solve a variant of the dining philosophers problem. This dining variant is defined by three properties: wait-freedom, eventual weak exclusion, and eventual bounded fairness. Wait-freedom guarantees that every correct hungry process eventually enters its critical section, regardless of process crashes. Eventual weak exclusion guarantees that every execution has an infinite suffix during which no two live neighbors execute overlapping critical sections. Eventual bounded fairness guarantees that there exists a fairness bound k such that every execution has an infinite suffix during which no correct hungry process is overtaken more than k times by any neighbor. This dining variant (WF-EBF dining for short) is important for synchronization tasks where eventual safety (i.e., eventual weak exclusion) is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, it is known that wait-free dining is unsolvable in asynchronous message-passing systems subject to crash faults. To circumvent this impossibility result, it is necessary to assume the existence of bounds on timing properties, such as relative process speeds and message delivery time. As such, it is of interest to characterize the necessary and sufficient timing assumptions to solve WF-EBF dining. We focus on implicit timing assumptions, which can be encapsulated by failure detectors. Failure detectors can be viewed as distributed oracles that can be queried for potentially unreliable information about crash faults. The weakest detector D for WF-EBF dining means that D is both necessary and sufficient. Necessity means that every failure detector that solves WF-EBF dining is at least as strong as D. Sufficiency means that there exists at least one algorithm that solves WF-EBF dining using D. As such, our research goal is to characterize the weakest failure detector to solve WF-EBF dining. We prove that the eventually perfect failure detector 3P is the weakest failure detector for solving WF-EBF dining. 3P eventually suspects crashed processes permanently, but may make mistakes by wrongfully suspecting correct processes finitely many times during any execution. As such, 3P eventually stops suspecting correct processes

    Fairness Properties of the Trusting Failure Detector

    Get PDF
    In 1985 it was shown by Fischer et al. that consensus, a fundamental problem in distributed computing, was impossible in asynchronous distributed systems in the presence of even just one process failure. This result prompted a search for alternative system models that were capable of solving such problems and culminated in the development of two helpful constructs: partially synchronous system models and failure detectors. Partially synchronous system models seek to solve the problem of identifying process crashes by constraining the real-time behavior of the underlying system. In the resulting models, crashed processes can be detected indirectly through the use of timeouts. Failure detectors, on the other hand, address process crashes by directly providing (potentially inaccurate) information on failures. As a result, failure detectors were viewed as abstractions of real-time information. Pike et al. proposed a different perspective on failure detectors; as abstracting fairness properties. Fairness in a system imposes bounds on the relative frequencies of communication and execution between processes in a system, and it was shown that four frequently-used failure detectors from the Chandra-Toueg hierarchy (P, ♢P, S, ♢S) encapsulate these fairness properties. This discovery suggests that failure detectors may be better understood as abstractions of fairness rather than real-time properties as well as demonstrates the possibility to communicate results between systems augmented with failure detectors and partially synchronous system models. In this thesis, we will be discussing an extension of the Pike et al. result to the trusting failure detector. The trusting failure detector is the weakest failure detector to implement the problem of fault-tolerant mutual exclusion: a fundamental primitive for distributed computing
    corecore