104 research outputs found

    (h,k)-Arbiters for h-out-of-k mutual exclusion problem

    Get PDF
    Abstracth-Out-of-k mutual exclusion is a generalization of the 1-mutual exclusion problem, where there are k units of shared resources and each process requests h(1⩽h⩽k) units at the same time. Though k-arbiter has been shown to be a quorum-based solution to this problem, quorums in k-arbiter are much larger than those in the 1-coterie for 1-mutual exclusion. Thus, the algorithm based on k-arbiter needs many messages. This paper introduces the new notion that each request uses different quorums depending on the number of units of its request. Based on the notion, this paper defines two (h,k)-arbiters for h-out-of-k mutual exclusion: a uniform (h,k)-arbiter and a (k+1)-cube (h,k)-arbiter. The quorums in each (h,k)-arbiter are not larger than the ones in the corresponding k-arbiter; consequently, it is more efficient to use (h,k)-arbiters than the k-arbiters. A uniform (h,k)-arbiter is a generalization of the majority coterie for 1-mutual exclusion. A (k+1)-cube (h,k)-arbiter is a generalization of square grid coterie for 1-mutual exclusion

    k-coteries for tolerating network 2-Partition

    Get PDF
    Network partition, which makes it impossible for some pairs of precesses to communicate with each other, is one of the most serious network failures. Although the notion of k-coterie is introduced to design a k-mutual exclusion algorithm robust against network failures, the number of processes allowed to simultaneously access the critical section may fatally decrease once network partition occurs. This paper discusses how to construct a k-coterie such that the k-mutual exclusion algorithm adopting it is robust against network 2-partition. To this end, we introduce the notion of complemental k-coterie, and show that complemental k-coteries meet our purpose. We then give methods for constructing complemental k-coteries, and show a necessary and sufficient condition for a k-coteries to be complemental

    Section critique à entrées multiples tolérante aux fautes et utilisant des détecteurs de défaillances

    Get PDF
    Nous présentons dans cet article un nouvel algorithme tolérant aux fautes de K-exclusion mutuelle. Cet algorithme à permission est une extension de l'algorithme de Raymond [Ray89]. Il tolère n − 1 fautes et reste efficace malgré les défaillances. L'algorithme repose sur un détecteur de fautes non fiable. Une évaluation de performances montre l'efficacité de notre approche en présence de fautes

    Coterie Join Operation and Tree Structured k-Coteries

    Get PDF
    The coterie join operation proposed by Neilsen and Mizuno produces, from a k-coterie and a coterie, a new k-coterie. For the coterie join operation, this paper first shows 1) a necessary and sufficient condition to produce a nondominated k-coterie (more accurately, a nondominated k-semicoterie satisfying Nonintersection Property) and 2) a sufficient condition to produce a k-conterie with higher availability. By recursively applying the coterie join operation in such a way that the above conditions hold, we define nondominated k-coteries, called tree structured k-coteries, the availabilities of which are thus expected to be very high. This paper then proposes a new k-mutual exclusion algorithm that effectively uses a tree structured k-coterie, by extending Agrawal and El Abbadi's tree algoriyhm. The number of messages necessary for k processes obeying the algorithm to simultaneously enter the critical section is approximately bounded by k log (n / k) in the best case, where n is the number of processes in the system

    Quorum Based Conflict Resolution Algorithms In Distributed Systems

    Get PDF
    Mutual exclusion is one of the most fundamental issues in the study of distributed systems. The problem arises when two or more processes are competing to use a mutual exclusive resource concurrently, i.e., the resource can only be used by at most one process at a time. Synchronizations adopting quorum systems are an important class of distributed algorithms since they are gracefully and significantly tolerate process and communication failures that may lead to network partitioning. Coterie based algorithm is a typical quorum based algorithm for mutual exclusion: A process can use the resource  only if it obtains permissions from all processes in any quorum ofcoterie, and since each quorum intersects with each other and each process only issues one permission, the mutual exclusion can be guaranteed. Many quorum systems have been defined based on the relaxation of the properties of coterie system. Each of them is designed to resolve its corresponding problem, e.g., k-coterie based algorithm to resolve the k-mutual exclusion, local coterie for the generalized mutual exclusion, (h, k)-arbiter for h-out of-k resource allocation problem, etc. Therefore, design an algorithm for any distributed conflict resolution problem is only meant to define a new quorum system which can be implemented to the corresponding problem. Since most of distributed conflict resolution problems are designed based on the relaxation of the safety property of mutual exclusion, understanding the way to relaxing the safety property and its quorum system is important to study any kind of conflict resolution problem in distributed systems

    Replicated Data and Partition Failures

    Get PDF
    In a distributed database system, data is often replicated to improve performance and availability. By storing copies of shared data on processors where it is frequently accessed, the need for expensive, remote read accesses is decreased. By storing copies of critical data on processors with independent failure modes, the probability that at least one copy of the data will be accessible increases. In theory, data replication makes it possible to provide arbitrarily high data availability. In practice, realizing the benefits of data replication is difficult since the correctness of data must be maintained. One important aspect of correctness with replicated data is mutual consistency: all copies of the same logical data-item must agree on exactly one current value for the data-item. Furthermore, this value should make sense in terms of the transactions executed on copies of the data-item. When communication fails between sites containing copies of the same logical data-item, mutual consistency between copies becomes complicated to ensure. The most disruptive of these communication failures are partition failures, which fragment the network into isolated subnetworks called partitions. Unless partition failures are detected and recognized by all affected processors, independent and uncoordinated updates may be applied to different copies of the data, thereby compromising the correctness of data. Consider, for example, an Airline Reservation System implemented by a distributed database which splits into two partitions when the communication network fails. If, at the time of the failure, all the nodes have one seat remaining for PAN AM 537, reservations could be made in both partitions. This would violate correctness: who should get the last seat? There should not be more seats reserved for a flight than physically exist on the plane. (Some airlines do not implement this constraint and allow overbookings.) The design of a replicated data management algorithm tolerating partition failures (or partition processing strategy) is a notoriously hard problem. Typically, the cause or extent of a partition failure cannot be discerned by the processors themselves. At best, a processor may be able to identify the other processors in its partition; but, for the processors outside of its partition, it will not be able to distinguish between the case where those processors are simply isolated from it and the case where those processors are down. In addition, slow responses can cause the network to appear partitioned even when it is not, further complicating the design of a fault-tolerant algorithm

    System support for object replication in distributed systems

    Get PDF
    Distributed systems are composed of a collection of cooperating but failure prone system components. The number of components in such systems is often large and, despite low probabilities of any particular component failing, the likelihood that there will be at least a small number of failures within the system at a given time is high. Therefore, distributed systems must be able to withstand partial failures. By being resilient to partial failures, a distributed system becomes more able to offer a dependable service and therefore more useful. Replication is a well known technique used to mask partial failures and increase reliability in distributed computer systems. However, replication management requires sophisticated distributed control algorithms, and is therefore a labour intensive and error prone task. Furthermore, replication is in most cases employed due to applications' non-functional requirements for reliability, as dependability is generally an orthogonal issue to the problem domain of the application. If system level support for replication is provided, the application developer can devote more effort to application specific issues. Distributed systems are inherently more complex than centralised systems. Encapsulation and abstraction of components and services can be of paramount importance in managing their complexity. The use of object oriented techniques and languages, providing support for encapsulation and abstraction, has made development of distributed systems more manageable. In systems where applications are being developed using object-oriented techniques, system support mechanisms must recognise this, and provide support for the object-oriented approach. The architecture presented exploits object-oriented techniques to improve transparency and to reduce the application programmer involvement required to use the replication mechanisms. This dissertation describes an approach to implementing system support for object replication, which is distinct from other approaches such as replicated objects in that objects are not specially designed for replication. Additionally, object replication, in contrast to data replication, is a function-shipping approach and deals with the replication of both operations and data. Object replication is complicated by objects' encapsulation of local state and the arbitrary interaction patterns that may exist among objects. Although fully transparent object replication has not been achieved, my thesis is that partial system support for replication of program-level objects is practicable and assists the development of certain classes of reliable distributed applications. I demonstrate the usefulness of this approach by describing a prototype implementation and showing how it supports the development of an example toy application. To increase their flexibility, the system support mechanisms described are tailorable. The approach adopted in this work is to provide partial support for object replication, relying on some assistance from the application developer to supply application dependent functionality within particular collators for dealing with processing of results from object replicas. Care is taken to make the programming model as simple and concise as possible

    Distributed consensus in wireless network

    Get PDF
    Connected autonomous systems, which are powered by the synergistic integration of the Internet of Things (IoT), Artificial Intelligence (AI), and 5G technologies, predominantly rely on a central node for making mission-critical decisions. This reliance poses a significant challenge that the condition and capability of the central node largely determine the reliability and effectiveness of decision-making. Maintaining such a centralized system, especially in large-scale wireless networks, can be prohibitively expensive and encounters scalability challenges. In light of these limitations, there’s a compelling need for innovative methods to address the increasing demands of reliability and latency, especially in mission-critical networks where cooperative decision-making is paramount. One promising avenue lies in the distributed consensus protocol, a mechanism intrinsic to distributed computing systems. These protocols offer enhanced robustness, ensuring continued functionality and responsiveness in decision-making even in the face of potential node or communication failures. This thesis pivots on the idea of leveraging distributed consensus to bolster the reliability of mission-critical decision-making within wireless networks, which delves deep into the performance characteristics of wireless distributed consensus, analyzing and subsequently optimizing its attributes, specifically focusing on reliability and latency. The research begins with a fundamental model of consensus reliability in an crash fault tolerance protocol Raft. A novel metric termed ReliabilityGain is introduced to analyze the performance of distributed consensus in wireless network. This innovative concept elucidates the linear correlation between the reliability inherent to consensus-driven decision-making and the reliability of communication link transmission. An intriguing discovery made in my study is the inherent trade-off between the time latency of achieving consensus and its reliability. These two variables appear to be in contradiction, which brings further performance optimization issues. The performance of the Crash and Byzantine fault tolerance protocol is scrutinized and they are compared with original centralized consensus. This exploration becomes particularly pertinent when communication failures occur in wireless distributed consensus. The analytical results are juxtaposed with performance metrics derived from a centralized consensus mechanism. This comparative analysis illuminates the relative merits and demerits of these consensus strategies, evaluated from the dual perspectives of comprehensive consensus reliability and communication latency. In light of the insights gained from the detailed analysis of the Raft and Hotstuff BFT protocols, my thesis further ventures into the realm of optimization strategies for wireless distributed consensus. A central facet of this exploration is the introduction of a tailored communication resource allocation scheme. This scheme, rooted in maximizing the performance of consensus mechanisms, dynamically assesses the network conditions and allocates communication resources such as transmit power and bandwidth to ensure efficient and timely decision-making, which ensures that even in varied and unpredictable network conditions, consensus can be achieved with minimized latency and maximized reliability. The research introduces an adaptive protocol of distributed consensus in wireless network. This proposed adaptive protocol’s strength lies in its ability to autonomously construct consensus-enabled network even if node failures or communication disruptions occur, which ensures that the network’s decision-making process remains uninterrupted and efficient, irrespective of external challenges. The sharding mechanism, which is regarded as an effective solution to scalability issues in distributed system, does not only aid in managing vast networks more efficiently but also ensure that any disruption in one shard cannot compromise the functionality of the entire network. Therefore, this thesis shows the reliability and security analysis of sharding that implemented in wireless distributed system. In essence, these intertwined strategies, rooted in the intricate dance of communication resource allocation, adaptability, and sharding, together form the bedrock of my contributions to enhancing the performance of wireless distributed consensus

    Distributed services for mobile ad hoc networks

    Get PDF
    A mobile ad hoc network consists of certain nodes that communicate only through wireless medium and can move arbitrarily. The key feature of a mobile ad hoc network is the mobility of the nodes. Because of the mobility, communication links form and disappear as nodes come into and go out of each other's communica- tion range. Mobile ad hoc networks are particularly useful in situations like disaster recovery and search, military operations, etc. Research on mobile ad hoc networks has drawn a huge amount of attention recently. The main challenges for mobile ad hoc networks are the sparse resources and frequent mobility. Most of the research work has been focused on the MAC and routing layer. In this work, we focus on distributed services for mobile ad hoc networks. These services will provide some fundamental functions in developing various applications for mobile ad hoc networks. In particular, we focus on the clock synchronization, connected dominating set, and k-mutual exclusion problems in mobile ad hoc networks
    corecore