70,804 research outputs found
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
Incremental Consistency Guarantees for Replicated Objects
Programming with replicated objects is difficult. Developers must face the
fundamental trade-off between consistency and performance head on, while
struggling with the complexity of distributed storage stacks. We introduce
Correctables, a novel abstraction that hides most of this complexity, allowing
developers to focus on the task of balancing consistency and performance. To
aid developers with this task, Correctables provide incremental consistency
guarantees, which capture successive refinements on the result of an ongoing
operation on a replicated object. In short, applications receive both a
preliminary---fast, possibly inconsistent---result, as well as a
final---consistent---result that arrives later.
We show how to leverage incremental consistency guarantees by speculating on
preliminary values, trading throughput and bandwidth for improved latency. We
experiment with two popular storage systems (Cassandra and ZooKeeper) and three
applications: a Twissandra-based microblogging service, an ad serving system,
and a ticket selling system. Our evaluation on the Amazon EC2 platform with
YCSB workloads A, B, and C shows that we can reduce the latency of strongly
consistent operations by up to 40% (from 100ms to 60ms) at little cost (10%
bandwidth increase, 6% throughput drop) in the ad system. Even if the
preliminary result is frequently inconsistent (25% of accesses), incremental
consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear
CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing
storage services, but fall short of guaranteeing linearizable consistency
in partially synchronous, lossy, partitionable, and dynamic networks, when data
is distributed and replicated automatically by the principle of consistent hashing.
This paper introduces consistent quorums as a solution for achieving atomic
consistency. We present the design and implementation of CATS, a distributed
key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is
scalable, elastic, and self-organizing; key properties for modern cloud storage
middleware. Our system shows that consistency can be achieved with practical
performance and modest throughput overhead (5%) for read-intensive workloads
Laying a Solid Foundation: Strategies for Effective Program Replication
With limited funds available for social investment, policymakers and philanthropists are naturally interested in supporting programs with the greatest chance of effectiveness and the ability to benefit the largest number of people. When a program rises to the fore with strong, proven results, it makes sense to ask whether that success can be reproduced in new settings.Program replication is premised on the understanding that many social problems are common across diverse communities -- and that it is far more cost-effective to systematically replicate an effective solution to these problems than to continually reinvent the wheel. When done well, replication of strong social programs has the potential to make a positive difference not just for individual participants, but indeed for entire communities, cities and the nation as a whole.Yet despite general agreement among policymakers and philanthropists about the value of replication, successful efforts to bring social programs to scale have been limited, and rarely is replication advanced through systematic public policy initiatives. More often, replication is the result of a particular social entrepreneur's tireless ambition, ability to raise funds and marketing savvy. The failure to spread social program successes more widely and methodically results from a lack of knowledge about the science and practice of replication and from the limited development of systems -- at local, state or federal levels -- to support replication.Fortunately, there seems to be growing awareness of the need to invest in such systems. For example, the 2009 Serve America Act included authorization for a new Social Innovation Fund that would "strengthen the infrastructure to identify, invest in, replicate and expand" proven initiatives. The Obama administration recently requested that Congress appropriate $50 million to this fund, with a focus on "find(ing) the most effective programs out there and then provid(ing) the capital needed to replicate their success in communities around the country."But more than financial capital is required to ensure that when a program is replicated, it will continue to achieve strong results. Over the past 15 years, Public/ Private Ventures (P/PV) has taken a deliberate approach to advancing the science and practice of program replication. Through our work with a wide range of funders and initiatives, including the well-regarded Nurse-Family Partnership, which has now spread to more than 350 communities nationwide, we have accumulated compelling evidence about specific strategies that can help ensure a successful replication. We have come to understand that programs approach replication at different stages in their development -- from fledgling individual efforts that have quickly blossomed and attracted a good deal of interest and support to more mature programs that have slowly expanded their reach and refined their approach over many years. There are rarer cases in which programs have rigorous research in hand proving their effectiveness, multiple sites in successful operation and willing funders prepared to support large-scale replication.Regardless of where a promising program may be in its development, our experience points to a number of important lessons and insights about the replication process, which can inform hard decisions about whether, when and how to expand a program's reach and total impact. In the interest of expanding programs that work, funders sometimes neglect the structures and processes that must be in place to support successful replication. These structures should be seen as the "connective tissue" between a program that seeks to expand and the provision of funding for that program's broad replication.This report represents a synthesis of P/PV's 30 years of designing, testing and replicating a variety of social programs and explains the key structures that should be in place before wide-scale replication is considered. It is designed to serve as a guide for policymakers, practitioners and philanthropists interested in a systematic approach to successful replication
Scalable Persistent Storage for Erlang
The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular
- …