3,425 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing
storage services, but fall short of guaranteeing linearizable consistency
in partially synchronous, lossy, partitionable, and dynamic networks, when data
is distributed and replicated automatically by the principle of consistent hashing.
This paper introduces consistent quorums as a solution for achieving atomic
consistency. We present the design and implementation of CATS, a distributed
key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is
scalable, elastic, and self-organizing; key properties for modern cloud storage
middleware. Our system shows that consistency can be achieved with practical
performance and modest throughput overhead (5%) for read-intensive workloads
Congress' Wicked Problem: Seeking Knowledge Inside the Information Tsunami
The lack of shared expert knowledge capacity in the U.S. Congress has created a critical weakness in our democratic process. Along with bipartisan cooperation, many contemporary and urgent questions before our legislators require nuance, genuine deliberation and expert judgment. Congress, however, is missing adequate means for this purpose and depends on outdated and in some cases antiquated systems of information referral, sorting, communicating, and convening. Congress is held in record low esteem by the public today. Its failings have been widely analyzed and a multitude of root causes have been identified. This paper does not put forward a simple recipe to fix these ailments, but argues that the absence of basic knowledge management in our legislature is a critical weakness. Congress struggles to make policy on complex issues while it equally lacks the wherewithal to effectively compete on substance in today's 24 hour news cycle.This paper points out that Congress is not so much venal and corrupt as it is incapacitated and obsolete. And, in its present state, it cannot serve the needs of American democracy in the 21st Century.The audience for this paper is those who are working in the open government, civic technology and transparency movements as well as other foundations, think tanks and academic entities. It is also for individuals inside and outside of government who desire background about Congress' current institutional dilemmas, including lack of expertise
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
DTC: A Dynamic Transaction Chopping Technique for Geo-Replicated Storage Services
Replicating data across geo-distributed datacenters is usually necessary for large scale cloud services to achieve high locality, durability and availability. One of the major challenges in such geo-replicated data services lies in consistency maintenance, which usually suffers from long latency due to costly coordination across datacenters. Among others, transaction chopping is an effective and efficient approach to address this challenge. However, existing chopping is conducted statically during programming, which is stubborn and complex for developers. In this article, we propose Dynamic Transaction Chopping (DTC), a novel technique that does transaction chopping and determines piecewise execution in a dynamic and automatic way. DTC mainly consists of two parts: a dynamic chopper to dynamically divide transactions into pieces according to the data partition scheme, and a conflict detection algorithm to check the safety of the dynamic chopping. Compared with existing techniques, DTC has several advantages: transparency to programmers, flexibility in conflict analysis, high degree of piecewise execution, and adaptability to data partition schemes. A prototype of DTC is implemented to verify the correctness of DTC and evaluate its performance. The experiment results show that our DTC technique can achieve much better performance than similar work
A Tour of Gallifrey, a Language for Geodistributed Programming
Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through _restrictions_ with _merge strategies_, _contingencies_ for conflicts arising from concurrency, and _branches_, a novel concurrency control construct inspired by version control, to contain provisional behavior
Multi-source data assimilation for physically based hydrological modeling of an experimental hillslope
Data assimilation has recently been the focus of much attention
for integrated surface–subsurface hydrological models, whereby joint
assimilation of water table, soil moisture, and river discharge measurements
with the ensemble Kalman filter (EnKF) has been extensively applied. Although
the EnKF has been specifically developed to deal with nonlinear models,
integrated hydrological models based on the Richards equation still represent
a challenge, due to strong nonlinearities that may significantly affect the
filter performance. Thus, more studies are needed to investigate the
capabilities of the EnKF to correct the system state and identify parameters
in cases where the unsaturated zone dynamics are dominant, as well as to
quantify possible tradeoffs associated with assimilation of multi-source
data. Here, the CATHY (CATchment HYdrology) model is applied to reproduce the hydrological dynamics
observed in an experimental two-layered hillslope, equipped with
tensiometers, water content reflectometer probes, and tipping bucket flow
gages to monitor the hillslope response to a series of artificial rainfall
events. Pressure head, soil moisture, and subsurface outflow are assimilated
with the EnKF in a number of scenarios and the challenges and issues arising
from the assimilation of multi-source data in this real-world test case are
discussed. Our results demonstrate that the EnKF is able to effectively
correct states and parameters even in a real application characterized by
strong nonlinearities. However, multi-source data assimilation may lead to
significant tradeoffs: the assimilation of additional variables can lead to
degradation of model predictions for other variables that are otherwise well
reproduced. Furthermore, we show that integrated observations such as outflow
discharge cannot compensate for the lack of well-distributed data in
heterogeneous hillslopes.</p
- …