24 research outputs found
Invariant preservation in geo-replicated data stores
The Internet has enabled people from all around the globe to communicate with each
other in a matter of milliseconds. This possibility has a great impact in the way we work,
behave and communicate, while the full extent of possibilities are yet to be known. As we become more dependent of Internet services, the more important is to ensure that these systems operate correctly, with low latency and high availability for millions of clients scattered all around the globe.
To be able to provide service to a large number of clients, and low access latency
for clients in different geographical locations, Internet services typically rely on georeplicated storage systems. Replication comes with costs that may affect service quality.
To propagate updates between replicas, systems either choose to lose consistency in favor of better availability and latency (weak consistency), or maintain consistency, but the system might become unavailable during partitioning (strong consistency).
In practice, many production systems rely on weak consistency storage systems to
enhance user experience, overlooking that applications can become incorrect due to the weaker consistency assumptions. In this thesis, we study how to exploit application’s
semantics to build correct applications without affecting the availability and latency of
operations.
We propose a new consistency model that breaks apart from traditional knowledge
that applications consistency is dependent on coordinating the execution of operations
across replicas. We show that it is possible to execute most operations with low latency
and in an highly available way, while preserving application’s correctness. Our approach consists in specifying the fundamental properties that define the correctness of applications, i.e. the application invariants, and identify and prevent concurrent executions that potentially can make the state of the database inconsistent, i.e. that may violate some invariant. We explore different, complementary, approaches to implement this model.
The Indigo approach consists in preventing conflicting operations from executing
concurrently, by restricting the operations that each replica can execute at each moment to maintain application’s correctness.
The IPA approach does not preclude the execution of any operation, ensuring high
availability. To maintain application correctness, operations are modified to prevent
invariant violations during replica reconciliation, or, if modifying operations provides an unsatisfactory semantics, it is possible to correct any invariant violations before a client
can read an inconsistent state, by executing compensations.
Evaluation shows that our approaches can ensure both low latency and high availability
for most operations in common Internet application workloads, with small execution
overhead in comparison to unmodified weak consistency systems, while enforcing application invariants, as in strong consistency systems
Key-CRDT stores
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaThe Internet has opened opportunities to create world scale services. These systems require highavailability and fault tolerance, while preserving low latency. Replication is a widely adopted technique to provide these properties. Different replication techniques have been proposed through the years, but to support these properties for world scale services it is necessary to trade consistency for availability, fault-tolerance and low latency. In weak consistency models, it is necessary to deal with possible conflicts arising from concurrent updates. We propose the use of conflict free replicated data types (CRDTs) to address this issue.
Cloud computing systems support world scale services, often relying on Key-Value stores for storing data. These systems partition and replicate data over multiple nodes, that can be geographically disperse over the network. For handling conflict, these systems either rely on solutions that lose updates (e.g. last-write-wins) or require application to handle concurrent updates. Additionally, these systems provide little support for transactions, a widely used abstraction for data access.
In this dissertation, we present the design and implementation of SwiftCloud, a Key-CRDT
store that extends a Key-Value store by incorporating CRDTs in the system’s data-model. The system provides automatic conflict resolution relying on properties of CRDTs. We also present a version of SwiftCloud that supports transactions. Unlike traditional transactional systems, transactions never abort due to write/write conflicts, as the system leverages CRDT properties to merge concurrent transactions. For implementing SwiftCloud, we have introduced a set of new techniques, including versioned CRDTs, composition of CRDTs and alternative serialization methods.
The evaluation of the system, with both micro-benchmarks and the TPC-W benchmark, shows that SwiftCloud imposes little overhead over a key-value store. Allowing clients to access a datacenter close to them with SwiftCloud, can reduce latency without requiring any complex reconciliation
mechanism. The experience of using SwiftCloud has shown that adapting an existing application to use SwiftCloud requires low effort.Project PTDC/EIA-EIA/108963/200
An optimized conflict-free replicated set
Eventual consistency of replicated data supports concurrent updates, reduces
latency and improves fault tolerance, but forgoes strong consistency.
Accordingly, several cloud computing platforms implement eventually-consistent
data types. The set is a widespread and useful abstraction, and many replicated
set designs have been proposed. We present a reasoning abstraction, permutation
equivalence, that systematizes the characterization of the expected concurrency
semantics of concurrent types. Under this framework we present one of the
existing conflict-free replicated data types, Observed-Remove Set. Furthermore,
in order to decrease the size of meta-data, we propose a new optimization to
avoid tombstones. This approach that can be transposed to other data types,
such as maps, graphs or sequences.Comment: No. RR-8083 (2012
Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants
Geo-replicated databases often operate under the principle of eventual
consistency to offer high-availability with low latency on a simple key/value
store abstraction. Recently, some have adopted commutative data types to
provide seamless reconciliation for special purpose data types, such as
counters. Despite this, the inability to enforce numeric invariants across all
replicas still remains a key shortcoming of relying on the limited guarantees
of eventual consistency storage. We present a new replicated data type, called
bounded counter, which adds support for numeric invariants to eventually
consistent geo-replicated databases. We describe how this can be implemented on
top of existing cloud stores without modifying them, using Riak as an example.
Our approach adapts ideas from escrow transactions to devise a solution that is
decentralized, fault-tolerant and fast. Our evaluation shows much lower latency
and better scalability than the traditional approach of using strong
consistency to enforce numeric invariants, thus alleviating the tension between
consistency and availability
Exploiting models for scalable and high throughput distributed software
In high-throughput distributed applications, such as large-scale banking systems, synchronization between objects becomes a bottleneck. This short paper focusses on research, in close collaboration with ING Bank, on the opportunity of leveraging application specific knowledge captured by model driven engineering approaches, to increase application performance in high-contention scenarios, while maintaining functional application-level consistency
SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine
Client-side logic and storage are increasingly used in web and mobile
applications to improve response time and availability. Current approaches tend
to be ad-hoc and poorly integrated with the server-side logic. We present a
principled approach to integrate client- and server-side storage. We support
mergeable and strongly consistent transactions that target either client or
server replicas and provide access to causally-consistent snapshots
efficiently. In the presence of infrastructure faults, a client-assisted
failover solution allows client execution to resume immediately and seamlessly
access consistent snapshots without waiting. We implement this approach in
SwiftCloud, the first transactional system to bring geo-replication all the way
to the client machine. Example applications show that our programming model is
useful across a range of application areas. Our experimental evaluation shows
that SwiftCloud provides better fault tolerance and at the same time can
improve both latency and throughput by up to an order of magnitude, compared to
classical geo-replication techniques
Improving the scalability of geo-replication with reservations
International audienceGeo-replicated systems improve performance and fault tolerance by replicating data on sites in different physical locations. These systems often eschew guaranteeing strong consistency because of performance loss and scalability and instead choose eventually consistency. Although eventual consistency improves performance especially in large scale but it might violate system invariants. In this work, we exploit reservation techniques to strengthen eventual consistency, by adding safety guarantees. We define a consistency model called RPB that takes the advantages of eventual consistency while providing stronger guarantees, including causality and safety properties
Brief announcement: semantics of eventually consistent replicated sets
This paper studies the semantics of sets under eventual consistency. The set is a pervasive data type, used either directly or as a component of more complex data types, such as maps or graphs. Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency (e.g., linearisability). Accordingly, several cloud computing platforms implement eventually-consistent replicated sets [2,4]
Concurrency Control and Awareness Support for Multi-synchronous Collaborative Editing
International audienceCollaborative editing tools have become increasingly popular in the last decade, with some systems being used by massive numbers of users. While traditionally collaborative editing systems would either target synchronous or asynchronous collaboration settings, some recent systems support both types of collaboration, even supporting disconnected work. In this paper we analyze the limitations of existing systems and propose a data management solution that overcomes such limitations. The proposed concurrency control algorithm, based on conflict-free data types, builds on the ideas previously developed for synchronous collaboration, extending them to support asynchronous collaboration. Our solution also includes the necessary information for providing comprehensive awareness information to users. The evaluation of our algorithm shows that comparing our solution with traditional solutions in collaborative editing, the conflict resolution strategy proposed in this paper leads to results closer to the ones expected by users