2,961 research outputs found

    A simple proof of a time-space trade-off for sorting with linear comparisons

    Get PDF
    AbstractIt is shown how to extend the techniques originally used to prove a lower bound of Ω(n2) for the product of the time and space consumed for sorting in branching programs with elementary comparisons, to the case of linear branching programs where linear functions on n input elements can be computed in unit time

    Finding Optimal Quorum Assigments for Distributed Databases

    Get PDF
    Replication has been studied as a method of increasing the availability of a data item in a distributed database subject to component failures and consequent partitioning. The potential for partitioning requires that a protocol be employed which guarantees that any access to a data item is aware of the most recent update to that data item. By minimizing the number of access requests denied due to this constraint, we maximize availability. In the event that all access requests are reads, placing one copy of the data item at each site clearly leads to maximum availability. The other extreme, all access requests are write requests or are treated as such, has been studied extensively in the literature. In this paper we investigate the performance of systems with both read and write requests. We describe a distributed on-line algorithm for determining the optimal parameters, or optimal quorum assignments, for a commonly studied protocol, the quorum consensus protocol[9]. We also show how to incorporate these optimization techniques into a dynamic quorum reassignment protocol. In addition, we demonstrate via simulation both the value of this algorithm and the effect of various read-write rations on availability. This simulation, on 101 sites and up to 5050 links(fully- connected), demonstrates that the techniques described here can greatly increase data availability, and that the best quorum assignments are frequently realized at the extreme values of the quorum parameters

    Availability Issues in Data Replication in Distributed Database

    Get PDF
    Replication of data at more than one site in a distributed database has been reported to increase the availability in data in systems where sites and links are subject to failure. We have shown in results summarized in this paper that in many interesting cases the advantage is slight. A well-placed single copy is available to transactions almost as much of the time as is correct replicated data no matter how ingeniously it is managed. We explain these findings in terms of the behavior of the partitions that form in networks where components fail. We also show that known and rather simple protocols for the maintenance of multiple copies are essentially best possible by comparing them against an unrealizable protocol that knows the future. We complete our study of these questions by reporting that while computing the availability of data is #P-complete, nonetheless there is a tight analytical bound on the amount replication can improve over a well-located single copy. We close with some observations regarding system design motivated by this work

    Effects of Replication on the Duration of Failure in Distributed Databases

    Get PDF
    Replicating data objects has been suggested as a means of increasing the performance of a distributed database system in a network subject to link and site failures. Since a network may partition as a consequence of such failures, a data object may become unavailable from a given site for some period of time. In this paper we study duration failure, which we define as the length of time, once the object becomes unavailable from a particular site, that the object remains unavailable. We show that, for networks composed of highly-reliable components, replication does not substantially reduce the duration of failure. We model a network as a collection of sites and links, each failing and recovering independently according to a Poisson process. Using this model, we demonstrate via simulation that the duration of failure incurred using a non-replicated data object is nearly as short as that incurred using a replicated object and a replication control protocol, including an unrealizable protocol which is optimal with respect to availability. We then examine analytically a simplified system in which the sites but not the links are subject to failure. We prove that if each site operates with probability p, then the optimal replication protocol, Available Copies [5,26], reduces the duration of failure by at most a factor of 1-p/1+p. Lastly, we present bounds for general systems, those in which both the sites and the communications between the sites may fail. We prove, for example, that if sites are 95% reliable and a communications failure is sufficiently short (either infallible or satisfying a function specified in the paper) then replication can improve the duration of failure by at most 2.7% of that experienced using a single copy. These results show that replication has only a small effect of the duration of failure in present-day partitionable networks comprised of realistically reliable components

    Complexity of Network Reliability and Optimal Database Placement Problems

    Get PDF
    A fundamental problem of distributed database design in an existing network where components can fail is finding an optimal location at which to place the database in a centralized system or copies of each data item in a decentralized or replicated system. In this paper it is proved for the first time exactly how hard this placement problem is under the measure of data availability. Specifically, we show that the optimal placement problem for availability is #P- complete, a measure of intractability at least as severe as NP-completeness. Given the anticipated computational difficulty of finding an exact solution, we go on to describe an effective, practical method for approximating the optimal copy placement. To obtain these results, we model the environment in which a distributed database operates by a probabilistic graph, which is a set of fully-reliable vertices representing sites, and a set of edges representing communication links, each operational with a rational probability. We prove that finding the optimal copy placement in a probabilistic graph is #P-complete by giving a sequence of reductions from #Satisfiability. We generalize this result to networks in which each site and each link has an independent, rational operational probability and to networks in which all the sites or all the links have a fixed, uniform operational probabilities

    Connected Components in O(lg3/2|V|) Parallel Time for the CREW PRAM

    Get PDF
    Computing the connected components of an undirected graph G = (V,E) on |V| = n vertices and |E| = m edges is a fundamental computational problem. The best known parallel algorithm for the CREW PRAM model runs on O(lg2n) time using n2/lg2n processors [CLC82,HCS79]. For the CRCW PRAM model in which concurrent writing is permitted, the best known algorithm runs in O(lg n) time using almost (n+m)/lg n processors [SV82,CV86,AS87]. Unfortunately, simulating this algorithm on the weaker CREW model increases its running time to O(lg2n) [CDR86, KR90,Vis83]. We present here an efficient and simple algorithm that runs in O(lg 3/2n) time using n+m CREW processors

    Optimal Parallel and Sequential Algorithms for the Vertex Updating Problem of a Minimum Spanning Tree

    Get PDF
    We present a set of rules that can be used to give optimal solutions to the vertex updating problem for a minimum spanning tree: Update a given MST when a new vertex z is introducted, along with weighted edges that connect z with the vertices of the graph. These rules lead to simple parallel algorithms that run in O(lg n) parallel time using n/lg n EREW PRAMs. They can also be used to derive simple linear-time sequential algorithms for the same problem. Furthermore, we show how our solution can be used to solve the multiple vertex updating problem

    A Bound of Data Availability when Networks Partition

    Get PDF
    Many consistency or replication control schemes that increase data availability in distributed systems exist, and the search for improvements continues, though there have been no good nontrivial upper bound demonstrating how much improvement is possible. We present a new upper bound for data availability under replication for general networks. In addition we also describe a new technique that yields near optimal levels of data availability with respect to this bound

    A Tight Upper Bound on the Benefits of Replication and Consistency Control Protocols

    Get PDF
    We present an upper bound on the performance provided by a protocol guaranteeing mutually exclusive access to a replicated resource in a network subject to component failure and subsequent partitioning. The bound is presented in terms of the performance of a single resource in the same network. The bound is tight and is the first such bound known to us. Since mutual exclusion is one of the requirements for maintaining the consistency of a database object, this bound provides an upper limit on the availability provided by any database consistency control protocol, including those employing dynamic data relocation and replication. We show that if a single copy provides availability A for 0 \u3c= A \u3c= 1, then no scheme can achieve availability greater than sqrt(A) in the same network. We show this bound to be the best possible for any network with availability greater than .25. Although, as we proved, the problem of calculating A is #P-complete, we describe a method for approximating the optimal location for a single copy which adjusts dynamically to current network characteristcs. This bound is most useful for high availabilities, which tend to be obtainable with modern networks and their constituent components

    Effects of Replication on Data Availability

    Get PDF
    In this paper we examine the effects of replication on the availability of data in a large network. This analysis differs from previous analyses in that it compares the performance of a dynamic consistency control protocol not only to that of other consistency control protocols, but also to the performance of non-replication and to an upper bound on data availability. This analysis also differes in that we gather extensive simulations on large networks subject to partitions at realistically high component reliabilities. We examine the dynamic consistency protocol presented by Jajodia and Mutchler [9, 12] and by Long and Paris[18] along with two proposed enhancements to this protocol[10,11]. We study networks of 101 sites and up to 5050 links (fully-connected) in which all components, although highly reliable, are subject to failure. We demonstrate the importance in this realistic environment of an oft neglected parameter of the system model, the ratio of transaction submissions to component failures. We also show the impact of the number of copies on both the protocol performance and the potential of replicaion as measured by the upper bound. Our simulations show that the majority of current protocol performs optimally for topologies that yield availabilities of at least 65%. On the other hand, the availability provided by non-replicaion is inferior to that of the majority of current protocol by a most 5.9 percentage points for these same topologies. At this point of maximum difference, theprimary copy protocol yields availability 59.1% and the majority of current protocol yields availability 65.0%. We discuss the characteristics of the model limiting the performance of replication
    • …
    corecore