70,804 research outputs found

    Blazes: Coordination Analysis for Distributed Programs

    Full text link
    Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

    Incremental Consistency Guarantees for Replicated Objects

    Get PDF
    Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    Laying a Solid Foundation: Strategies for Effective Program Replication

    Get PDF
    With limited funds available for social investment, policymakers and philanthropists are naturally interested in supporting programs with the greatest chance of effectiveness and the ability to benefit the largest number of people. When a program rises to the fore with strong, proven results, it makes sense to ask whether that success can be reproduced in new settings.Program replication is premised on the understanding that many social problems are common across diverse communities -- and that it is far more cost-effective to systematically replicate an effective solution to these problems than to continually reinvent the wheel. When done well, replication of strong social programs has the potential to make a positive difference not just for individual participants, but indeed for entire communities, cities and the nation as a whole.Yet despite general agreement among policymakers and philanthropists about the value of replication, successful efforts to bring social programs to scale have been limited, and rarely is replication advanced through systematic public policy initiatives. More often, replication is the result of a particular social entrepreneur's tireless ambition, ability to raise funds and marketing savvy. The failure to spread social program successes more widely and methodically results from a lack of knowledge about the science and practice of replication and from the limited development of systems -- at local, state or federal levels -- to support replication.Fortunately, there seems to be growing awareness of the need to invest in such systems. For example, the 2009 Serve America Act included authorization for a new Social Innovation Fund that would "strengthen the infrastructure to identify, invest in, replicate and expand" proven initiatives. The Obama administration recently requested that Congress appropriate $50 million to this fund, with a focus on "find(ing) the most effective programs out there and then provid(ing) the capital needed to replicate their success in communities around the country."But more than financial capital is required to ensure that when a program is replicated, it will continue to achieve strong results. Over the past 15 years, Public/ Private Ventures (P/PV) has taken a deliberate approach to advancing the science and practice of program replication. Through our work with a wide range of funders and initiatives, including the well-regarded Nurse-Family Partnership, which has now spread to more than 350 communities nationwide, we have accumulated compelling evidence about specific strategies that can help ensure a successful replication. We have come to understand that programs approach replication at different stages in their development -- from fledgling individual efforts that have quickly blossomed and attracted a good deal of interest and support to more mature programs that have slowly expanded their reach and refined their approach over many years. There are rarer cases in which programs have rigorous research in hand proving their effectiveness, multiple sites in successful operation and willing funders prepared to support large-scale replication.Regardless of where a promising program may be in its development, our experience points to a number of important lessons and insights about the replication process, which can inform hard decisions about whether, when and how to expand a program's reach and total impact. In the interest of expanding programs that work, funders sometimes neglect the structures and processes that must be in place to support successful replication. These structures should be seen as the "connective tissue" between a program that seeks to expand and the provision of funding for that program's broad replication.This report represents a synthesis of P/PV's 30 years of designing, testing and replicating a variety of social programs and explains the key structures that should be in place before wide-scale replication is considered. It is designed to serve as a guide for policymakers, practitioners and philanthropists interested in a systematic approach to successful replication

    Scalable Persistent Storage for Erlang

    Get PDF
    The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular
    • …
    corecore