3,425 research outputs found

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    Congress' Wicked Problem: Seeking Knowledge Inside the Information Tsunami

    Get PDF
    The lack of shared expert knowledge capacity in the U.S. Congress has created a critical weakness in our democratic process. Along with bipartisan cooperation, many contemporary and urgent questions before our legislators require nuance, genuine deliberation and expert judgment. Congress, however, is missing adequate means for this purpose and depends on outdated and in some cases antiquated systems of information referral, sorting, communicating, and convening. Congress is held in record low esteem by the public today. Its failings have been widely analyzed and a multitude of root causes have been identified. This paper does not put forward a simple recipe to fix these ailments, but argues that the absence of basic knowledge management in our legislature is a critical weakness. Congress struggles to make policy on complex issues while it equally lacks the wherewithal to effectively compete on substance in today's 24 hour news cycle.This paper points out that Congress is not so much venal and corrupt as it is incapacitated and obsolete. And, in its present state, it cannot serve the needs of American democracy in the 21st Century.The audience for this paper is those who are working in the open government, civic technology and transparency movements as well as other foundations, think tanks and academic entities. It is also for individuals inside and outside of government who desire background about Congress' current institutional dilemmas, including lack of expertise

    Archiving the Relaxed Consistency Web

    Full text link
    The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201

    DTC: A Dynamic Transaction Chopping Technique for Geo-Replicated Storage Services

    Get PDF
    Replicating data across geo-distributed datacenters is usually necessary for large scale cloud services to achieve high locality, durability and availability. One of the major challenges in such geo-replicated data services lies in consistency maintenance, which usually suffers from long latency due to costly coordination across datacenters. Among others, transaction chopping is an effective and efficient approach to address this challenge. However, existing chopping is conducted statically during programming, which is stubborn and complex for developers. In this article, we propose Dynamic Transaction Chopping (DTC), a novel technique that does transaction chopping and determines piecewise execution in a dynamic and automatic way. DTC mainly consists of two parts: a dynamic chopper to dynamically divide transactions into pieces according to the data partition scheme, and a conflict detection algorithm to check the safety of the dynamic chopping. Compared with existing techniques, DTC has several advantages: transparency to programmers, flexibility in conflict analysis, high degree of piecewise execution, and adaptability to data partition schemes. A prototype of DTC is implemented to verify the correctness of DTC and evaluate its performance. The experiment results show that our DTC technique can achieve much better performance than similar work

    A Tour of Gallifrey, a Language for Geodistributed Programming

    Get PDF
    Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through _restrictions_ with _merge strategies_, _contingencies_ for conflicts arising from concurrency, and _branches_, a novel concurrency control construct inspired by version control, to contain provisional behavior

    Multi-source data assimilation for physically based hydrological modeling of an experimental hillslope

    Get PDF
    Data assimilation has recently been the focus of much attention for integrated surface–subsurface hydrological models, whereby joint assimilation of water table, soil moisture, and river discharge measurements with the ensemble Kalman filter (EnKF) has been extensively applied. Although the EnKF has been specifically developed to deal with nonlinear models, integrated hydrological models based on the Richards equation still represent a challenge, due to strong nonlinearities that may significantly affect the filter performance. Thus, more studies are needed to investigate the capabilities of the EnKF to correct the system state and identify parameters in cases where the unsaturated zone dynamics are dominant, as well as to quantify possible tradeoffs associated with assimilation of multi-source data. Here, the CATHY (CATchment HYdrology) model is applied to reproduce the hydrological dynamics observed in an experimental two-layered hillslope, equipped with tensiometers, water content reflectometer probes, and tipping bucket flow gages to monitor the hillslope response to a series of artificial rainfall events. Pressure head, soil moisture, and subsurface outflow are assimilated with the EnKF in a number of scenarios and the challenges and issues arising from the assimilation of multi-source data in this real-world test case are discussed. Our results demonstrate that the EnKF is able to effectively correct states and parameters even in a real application characterized by strong nonlinearities. However, multi-source data assimilation may lead to significant tradeoffs: the assimilation of additional variables can lead to degradation of model predictions for other variables that are otherwise well reproduced. Furthermore, we show that integrated observations such as outflow discharge cannot compensate for the lack of well-distributed data in heterogeneous hillslopes.</p
    • …
    corecore