53 research outputs found

    HQ Replication: Properties and Optimizations

    Get PDF
    There are currently two approaches to providing Byzantine-fault-tolerant state machine replication: a replica-based approach, e.g., BFT, that uses communication between replicas to agree on a proposed ordering of requests, and a quorum-based approach, such as Q/U, in which clients contact replicas directly to optimistically execute operations. Both approaches have shortcomings: the quadratic cost of inter-replica communication is unnecessary when there is no contention, and Q/U requires a large number of replicas and performs poorly under contention.We present HQ, a hybrid Byzantine-fault-tolerant state machine replication protocol that overcomes these problems. HQ employs a lightweight quorum-based protocol when there is no contention, but uses BFT to resolve contention when it arises. Furthermore, HQ uses only 3f+1 replicas to tolerate f faults, providing optimal resilience to node failures.We implemented a prototype of HQ, and we compare its performance to BFT and Q/U analytically and experimentally. Additionally, in this work we use a new implementation of BFT designed to scale as the number of faults increases. Our results show that both HQ and our new implementation of BFT scale as f increases; additionally our hybrid approach of using BFT to handle contention works well

    Thresher: An efficient storage manager for copy-on-write snapshots

    No full text
    A new generation of storage systems exploit decreasing storage costs to allow applications to take snapshots of past states and retain them for long durations. Over time, current snapshot techniques can produce large volumes of snapshots. Indiscriminately keeping all snapshots accessible is impractical, even if raw disk storage is cheap, because administering such large-volume storage is expensive over a long duration. Moreover, not all snapshots are equally valuable. Thresher is a new snapshot storage management system, based on novel copyon-write snapshot techniques, that is the first to provide applications the ability to discriminate among snapshots efficiently. Valuable snapshots can remain accessible or stored with faster access while less valuable snapshots are discarded or moved off-line. Measurements of the Thresher prototype indicate that the new techniques are efficient and scalable, imposing minimal (4%) performance penalty on expected common workloads.

    Time travel in the virtualized past: Cheap fares and first class seats

    No full text
    “Time travel ” in the storage system is accessing past storage system states. Legacy application programs could run transparently over the past states if the past states were virtualized in a form that makes them look like the current state. There are many levels in the storage system at which past state virtualization could occur. How do we choose? We think that past state virtualization should occur at a high storage system buffer manager level, such as database buffer manager. Everything above this level can run legacy programs. The system below can manage the mechanisms needed to implement the virtualization. This approach can be applied to any kind of storage system, ranging from traditional databases and file systems to the new generation of specialized storage managers such as Bigtable. Grante

    Hybrid Caching for Scalable Object Systems (Think Globally, Act Locally)

    No full text
    Object-based client caching allows clients to keep more frequently accessed objects while discarding colder objects that reside on the same page. However, when these objects are modified and sent to the server, it may need to read the corresponding page from disk to install the update. These installation reads..

    Opportunistic Log: Efficient Installation Reads in a Reliable Storage Server

    No full text
    In a distributed storage system, client caches managed on the basis of small granularity objects can provide better memory utilization then page-based caches. However, object servers, unlike page servers, must perform additional disk reads. These installation reads are required to install modified objects onto their corresponding disk pages. The opportunistic log is a new technique that significantly reduces the cost of installation reads. It defers the installation reads, removing them from the modification commit path, and manages a large pool of pending installation reads that can be scheduled efficiently. Using simulations, we show that the opportunistic log substantially enhances the I/O performance of reliable storage servers. An object server without the opportunistic log requires much better client caching to outperform a page server. With an opportunistic log, only a small client cache improvement suffices. Our results imply that efficient scheduling of installation reads ca..
    corecore