355 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Partial replication with strong consistency
In response to the increasing expectations of their clients, cloud services exploit
geo-replication to provide fault-tolerance, availability and low latency when executing
requests. However, cloud platforms tend to adopt weak consistency semantics, in which
replicas may diverge in state independently. These systems offer good response times
but at the disadvantage of allowing potential data inconsistencies that may affect user
experience.
Some systems propose to adopt solutions with strong consistency, which are not as
efficient but simplify the development of correct applications by guaranteeing that all
replicas in the system maintain the same database state. Therefore, it is interesting to explore
a system that can offer strong consistency while minimizing its main disadvantage:
the impact in performance that results from coordinating every replica in the system. A
possible solution to reduce the cost of replica coordination is to support partial replication.
Partially replicating a database allows for each server to only be responsible for a
subset of the data - a partition - which means that when updating the database only some
of replicas have to be synchronized, improving response times.
In this dissertation, we propose an algorithm that implements a distributed replicated
database that offers strong consistency with support for partial replication. To achieve
strong consistency in a partially replicated scenario, our algorithm is in part based on the
Clock-SI[10] research, which presents an algorithm that implements a multi-versioned
database for strong consistency (snapshot-isolation) and performs the Two-Phase Commit
protocol when coordinating replicas during updates. The algorithm is supported by
an architecture that simplifies distributing partitions among datacenters and efficiently
propagating operations across nodes in the same partition, thanks to the ChainPaxos[27]
algorithm.Como forma de responder às expectativas cada vez maiores dos seus clientes, as
operadoras cloud tiram partido da geo-replicação para oferecer tolerância a falhas, disponibilidade
e baixa latência dos seus sistemas na resposta aos pedidos. No entanto, as
plataformas cloud tendem a adotar uma semântica de consistência fraca, na qual as réplicas
podem variar em estado de forma independente. Estes sistemas oferecem bons tempos
de resposta mas com a desvantagem de que têm de lidar com potenciais inconsistências
nos dados que podem ter impacto na experiência dos utilizadores.
Alguns sistemas propõem adotar soluções com consistência forte, as quais não são
tão eficientes mas simplificam o desenvolvimento de aplicações ao garantir que todas
as réplicas do sistema mantêm o mesmo estado da base de dados. É então interessante
explorar um sistema que garanta replicação forte mas que minimize a sua principal
desvantagem: o impacto de performance no momento de coordenar o estado das réplicas
nos sistema. Uma possÃvel solução para reduzir o custo de coordenação das réplicas
durante transações é o suporte à replicação parcial. Replicar parcialmente uma base de
dados permite que cada servidor seja apenas responsável por uma parte dos dados - uma
partição - o que significa que quando são realizadas escritas apenas algumas das réplicas
têm de ser sincronizadas, melhorando os tempos de resposta.
Neste trabalho propomos um algoritmo que implementa um sistema de armazenamento
distrÃbuido replicado que oferece consistência forte com suporte a replicação parcial.
A fim de garantir consistência forte num cenário de replicação parcial, o nosso
algoritmo é em parte baseado no algoritmo Clock-SI[10], que implementa uma base de
dados parcial com multi-versões para garantir consistência forte (snapshot-isolation) e
que realiza o protocolo Two-Phase Commit para coordenar as réplicas no momento de
aplicar escritas. O algoritmo é suportado por uma arquitectura que torna simples distribuir
partições por vários centros de dados e propagar de forma eficiente operações entre
todos os nós numa mesma partição, através do algoritmo ChainPaxos[27]
Recommended from our members
From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality
The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems
MDCC: Multi-Data Center Consistency
Replicating data across multiple data centers not only allows moving the data
closer to the user and, thus, reduces latency for applications, but also
increases the availability in the event of a data center failure. Therefore, it
is not surprising that companies like Google, Yahoo, and Netflix already
replicate user data across geographically different regions.
However, replication across data centers is expensive. Inter-data center
network delays are in the hundreds of milliseconds and vary significantly.
Synchronous wide-area replication is therefore considered to be unfeasible with
strong consistency and current solutions either settle for asynchronous
replication which implies the risk of losing data in the event of failures,
restrict consistency to small partitions, or give up consistency entirely. With
MDCC (Multi-Data Center Consistency), we describe the first optimistic commit
protocol, that does not require a master or partitioning, and is strongly
consistent at a cost similar to eventually consistent protocols. MDCC can
commit transactions in a single round-trip across data centers in the normal
operational case. We further propose a new programming model which empowers the
application developer to handle longer and unpredictable latencies caused by
inter-data center communication. Our evaluation using the TPC-W benchmark with
MDCC deployed across 5 geographically diverse data centers shows that MDCC is
able to achieve throughput and latency similar to eventually consistent quorum
protocols and that MDCC is able to sustain a data center outage without a
significant impact on response times while guaranteeing strong consistency
Recommended from our members
Global-Scale Data Management with Strong Consistency Guarantees
Global-scale data management(GSDM) empowers systems by providing higher levels of fault-tolerance, read availability, and efficiency in utilizing cloud resources. This has led to the emergence of global-scale data management and event processing. However, the Wide-Area Network (WAN) latency separating datacenters is orders of magnitude larger than typical network latencies, and this requires a reevaluation of many of the traditional design trade-offs of data management systems. Therefore, data management problems must be revisited to account for the new design space. In this dissertation, we propose theoretical foundations to understand the limits imposed by WAN latency on GSDM, and propose practical systems and protocols to minimize the overhead caused by WAN latency. The presented work spans global-scale transaction processing, communication, analytics, and machine learning. In all these directions, the focus is on the trade-off between consistency and latency, where we ask the question: what is the best performance (often latency) we can achieve without compromising the consistency and integrity of data? For transaction processing, we propose a lower-bound formulation for transaction latency that is imposed by the WAN latency. Also, we propose a new paradigm for transaction processing (proactive coordination) that inspired out two proposed protocols, Message Futures and Helios, which can achieve the lower-bound latency. We also propose a communication framework, called Chariots, to scale multi-datacenter communication. Chariots is carefully designed to allow scaling communication while providing a consistent view of the communicated information. Finally, we explore challenges in global-scale analytics and machine learning. Specifically, we propose Ogre, a scalable system for global-scale heterogeneous transactional and analytics workloads. Also, we propose COP, a system designed to speed up machine learning on globally generated data
DTC: A Dynamic Transaction Chopping Technique for Geo-Replicated Storage Services
Replicating data across geo-distributed datacenters is usually necessary for large scale cloud services to achieve high locality, durability and availability. One of the major challenges in such geo-replicated data services lies in consistency maintenance, which usually suffers from long latency due to costly coordination across datacenters. Among others, transaction chopping is an effective and efficient approach to address this challenge. However, existing chopping is conducted statically during programming, which is stubborn and complex for developers. In this article, we propose Dynamic Transaction Chopping (DTC), a novel technique that does transaction chopping and determines piecewise execution in a dynamic and automatic way. DTC mainly consists of two parts: a dynamic chopper to dynamically divide transactions into pieces according to the data partition scheme, and a conflict detection algorithm to check the safety of the dynamic chopping. Compared with existing techniques, DTC has several advantages: transparency to programmers, flexibility in conflict analysis, high degree of piecewise execution, and adaptability to data partition schemes. A prototype of DTC is implemented to verify the correctness of DTC and evaluate its performance. The experiment results show that our DTC technique can achieve much better performance than similar work
NCC: Natural Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall
Strictly serializable datastores greatly simplify the development of correct
applications by providing strong consistency guarantees. However, existing
techniques pay unnecessary costs for naturally consistent transactions, which
arrive at servers in an order that is already strictly serializable. We find
these transactions are prevalent in datacenter workloads. We exploit this
natural arrival order by executing transaction requests with minimal costs
while optimistically assuming they are naturally consistent, and then leverage
a timestamp-based technique to efficiently verify if the execution is indeed
consistent. In the process of designing such a timestamp-based technique, we
identify a fundamental pitfall in relying on timestamps to provide strict
serializability, and name it the timestamp-inversion pitfall. We find
timestamp-inversion has affected several existing works.
We present Natural Concurrency Control (NCC), a new concurrency control
technique that guarantees strict serializability and ensures minimal costs --
i.e., one-round latency, lock-free, and non-blocking execution -- in the best
(and common) case by leveraging natural consistency. NCC is enabled by three
key components: non-blocking execution, decoupled response control, and
timestamp-based consistency check. NCC avoids timestamp-inversion with a new
technique: response timing control, and proposes two optimization techniques,
asynchrony-aware timestamps and smart retry, to reduce false aborts. Moreover,
NCC designs a specialized protocol for read-only transactions, which is the
first to achieve the optimal best-case performance while ensuring strict
serializability, without relying on synchronized clocks. Our evaluation shows
that NCC outperforms state-of-the-art solutions by an order of magnitude on
many workloads
Invariant preservation in geo-replicated data stores
The Internet has enabled people from all around the globe to communicate with each
other in a matter of milliseconds. This possibility has a great impact in the way we work,
behave and communicate, while the full extent of possibilities are yet to be known. As we become more dependent of Internet services, the more important is to ensure that these systems operate correctly, with low latency and high availability for millions of clients scattered all around the globe.
To be able to provide service to a large number of clients, and low access latency
for clients in different geographical locations, Internet services typically rely on georeplicated storage systems. Replication comes with costs that may affect service quality.
To propagate updates between replicas, systems either choose to lose consistency in favor of better availability and latency (weak consistency), or maintain consistency, but the system might become unavailable during partitioning (strong consistency).
In practice, many production systems rely on weak consistency storage systems to
enhance user experience, overlooking that applications can become incorrect due to the weaker consistency assumptions. In this thesis, we study how to exploit application’s
semantics to build correct applications without affecting the availability and latency of
operations.
We propose a new consistency model that breaks apart from traditional knowledge
that applications consistency is dependent on coordinating the execution of operations
across replicas. We show that it is possible to execute most operations with low latency
and in an highly available way, while preserving application’s correctness. Our approach consists in specifying the fundamental properties that define the correctness of applications, i.e. the application invariants, and identify and prevent concurrent executions that potentially can make the state of the database inconsistent, i.e. that may violate some invariant. We explore different, complementary, approaches to implement this model.
The Indigo approach consists in preventing conflicting operations from executing
concurrently, by restricting the operations that each replica can execute at each moment to maintain application’s correctness.
The IPA approach does not preclude the execution of any operation, ensuring high
availability. To maintain application correctness, operations are modified to prevent
invariant violations during replica reconciliation, or, if modifying operations provides an unsatisfactory semantics, it is possible to correct any invariant violations before a client
can read an inconsistent state, by executing compensations.
Evaluation shows that our approaches can ensure both low latency and high availability
for most operations in common Internet application workloads, with small execution
overhead in comparison to unmodified weak consistency systems, while enforcing application invariants, as in strong consistency systems
- …