2,106 research outputs found

    Causal synchrony in the design of distributed programs

    Get PDF
    The outcome of any computation is determined by the order of the events in the computation and the state of the component variables of the computation at those events. The level of knowledge that can be obtained about event order and process state influences protocol design and operation. In a centralized system, the presence of a physical clock makes it easy to determine event order. It is a more difficult task in a distributed system because there is normally no global time. Hence, there is no common time reference to be used for ordering events. as a consequence, distributed protocols are often designed without explicit reference to event order. Instead they are based on some approximation of global state. Because global state is also difficult to identify in a distributed system, the resulting protocols are not as efficient or clear as they could be.;We subscribe to Lamport\u27s proposition that the relevant temporal ordering of any two events is determined by their causal relationship and that knowledge of the causal order can be a powerful tool in protocol design. Mattern\u27s vector time can be used to identify the causal order, thereby providing the common frame of reference needed to order events in a distributed computation. In this dissertation we present a consistent methodology for analysis and design of distributed protocols that is based on the causal order and vector time. Using it we can specify conditions which must be met for a protocol to be correct, we can define the axiomatic protocol specifications, and we can structure reasoning about the correctness of the specified protocol. Employing causality as a unifying concept clarifies protocol specifications and correctness arguments because it enables them to be defined purely in terms of local states and local events.;We have successfully applied this methodology to the problems of distributed termination detection, distributed deadlock detection and resolution, and optimistic recovery. In all cases, the causally synchronous protocols we have presented are efficient and demonstrably correct

    Self-stabilizing deadlock algorithms in distributed systems

    Full text link
    A self-stabilizing system is a network of processors, which, when started from an arbitrary (and possibly illegal) initial state, always returns to a legal state in a finite number of steps. Self-stabilization is an evolving paradigm in fault-tolerant computing. This research will be the first time self-stabilization is used in the areas of deadlock detection and prevention. Traditional deadlock detection algorithms have a process initiate a probe. If that probe travels around the system and is received by the initiator, there is a cycle in the system, and deadlock is detected. In order to prevent deadlocks, algorithms usually rank nodes in order to determine if an added edge will create a deadlock in the system. In a self-stabilizing system, perturbances are automatically dealt with. For the deadlock model, the perturbances in the system are requests and releases of resources. So, the self-stabilizing deadlock detection algorithm will automatically detect a deadlock when a request causes a cycle in the wait-for graph. The self-stabilizing prevention algorithm prevents deadlocks in a similar manner. The self-stabilizing algorithms do not have to be initiated by any process because the requests and releases create a perturbance which is dealt with automatically

    Design of deadlock detection and prevention algorithms in distributed systems

    Full text link
    A distributed system consists of a collection of processes which communicate with each other by exchanging messages to achieve a common goal. One of the key problems in distributed systems is the possibility of deadlock. Processes are said to be deadlocked when some processes are blocked on resource requests that can never be satisfied unless drastic systems action is taken. Two distributed deadlock detection algorithms handling multiple outstanding requests is proposed and are proven to be correct: it detects all cycles and does not detect false deadlocks. The algorithms are based on the concept of chasing the edge of the waitfor graph (probe-based). Simulation results show that the proposed algorithm performs very well compared to some existing algorithms. A deadlock prevention algorithm based on the notion of coloring the nodes of the waitfor graph is also proposed. Rollback is quite less compared to some existing algorithms

    UPC-CHECK: A scalable tool for detecting run-time errors in Unified Parallel C

    Get PDF
    Unied Parallel C (UPC) is a language used to write parallel programs for shared and distributed memory parallel computers. UPC-CHECK is a scalable tool developed to automatically detect argument errors in UPC functions and deadlocks in UPC programs at run-time and issue high quality error messages to help programmers quickly x those errors. The tool is easy to use and involves merely replacing the compiler command with upc-check. The tool uses a novel distributed algorithm for detecting argument and deadlock errors in collective operations. The run-time complexity of the algorithm has been proven to be O(1). The algorithm has been extended to detect deadlocks created involving locks with a run-time complexity of O(T), where T is the number of threads waiting to acquire a lock. Error messages issued by UPC-CHECK were evaluated using the UPC RTED test suite for argument errors in UPC functions and deadlocks. Results of these tests show that the error messages issued by UPC-CHECK for these tests are excellent. The scalability of all the algorithms used was demonstrated using performance-evaluation test programs and the UPC NAS Parallel Benchmarks

    Global state predicates in rough real-time

    Get PDF
    Distributed systems are characterized by the fact that the constituent processes have neither common memory nor a common system clock. These processes communicate solely via message passing. While providing a number of benefits such as increased reliability, increased computational power, and geographic dispersion, this architecture significantly complicates many of the tasks of software development and verification, including evaluation of the program state. In the case of distributed systems, the program state is comprised of the local states of the constituent processes, as well as the state of the channels between processes, and is called the global state.;With no common system clock, many distributed system protocols rely on the global ordering of local process events imposed by the message passing that occurs between processes. This leads to a partial global ordering of local process events, which can then be used to determine which process states could (or could not) have occurred simultaneously.;Traditional predicate evaluation protocols evaluate predicates on the global state of a distributed computation using consistent global states. This evaluation is complicated by the fact that the event ordering imposed by message passing is only partial. A complete history of the global states that occurred during an execution cannot always be constructed. This introduces inefficiency into predicate detection protocols and prohibits detection of certain predicates.;This dissertation explores the use of this rough global time base for global state predicate evaluation within distributed systems. By structuring the evaluation on the assumption that a global time base exists, we can develop simple and efficient protocols for both stable and unstable predicate evaluation. Further, we can evaluate certain predicates which are not easily evaluated using consistent global states. We demonstrate these advantages by developing protocols for detection of distributed termination, distributed deadlock detection, and detection of certain unstable predicates as they occur. as the global time base is rough, we can only detect unstable predicates which remain true for a sufficient duration. We additionally develop several formalizations which assist the protocol developer in dealing with the fact that the global time base is not perfect. We demonstrate the application of these formalizations within the protocols that we develop

    A unified concurrency control algorithm for distributed database systems

    Get PDF
    We present a unified concurrency-control algorithm for distributed database systems in which each transaction may choose its own concurrency control protocol. Specifically, they integrate two-phase locking, timestamp ordering, and precedence agreement into one unified concurrency-control scheme. They show the correctness of the scheme and study the problem of selecting the best protocol for each transaction to optimize system performance.published_or_final_versio
    corecore