33 research outputs found
Building a collaborative peer-to-peer wiki system on a structured overlay
International audienceThe ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. We propose a peer-to-peer solution for distributing and managing dynamic content, that combines two widely studied technologies: Distributed Hash Tables (DHT) and optimistic replication. In our âuniversal wikiâ engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability, as well as suitability for large-scale scenarios. The implementation is based on Damon, a distributed AOP middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system transparently usable by third party wiki engines. Finally, UniWiki has been proved viable and fairly efficient in large-scale scenarios
UniWiki: A Reliable and Scalable Peer-to-Peer System for Distributing Wiki Applications
The ever growing request for digital information raises the need for content distribution architectures providing high storage capacity, data availability and good performance. While many simple solutions for scalable distribution of quasi-static content exist, there are still no approaches that can ensure both scalability and consistency for the case of highly dynamic content, such as the data managed inside wikis. In this paper, we propose a peer to peer solution for distributing and managing dynamic content, that combines two widely studied technologies: distributed hash tables (DHT) and optimistic replication. In our ``universal wiki'' engine architecture (UniWiki), on top of a reliable, inexpensive and consistent DHT-based storage, any number of front-ends can be added, ensuring both read and write scalability. The implementation is based on a Distributed Interception Middleware, thus separating distribution, replication, and consistency responsibilities, and also making our system usable by third party wiki engines in a transparent way. UniWiki has been proved viable and fairly efficient in large-scale scenarios
Remove-Win: a Design Framework for Conflict-free Replicated Data Collections
Internet-scale distributed systems often replicate data within and across
data centers to provide low latency and high availability despite node and
network failures. Replicas are required to accept updates without coordination
with each other, and the updates are then propagated asynchronously. This
brings the issue of conflict resolution among concurrent updates, which is
often challenging and error-prone. The Conflict-free Replicated Data Type
(CRDT) framework provides a principled approach to address this challenge.
This work focuses on a special type of CRDT, namely the Conflict-free
Replicated Data Collection (CRDC), e.g. list and queue. The CRDC can have
complex and compound data items, which are organized in structures of rich
semantics. Complex CRDCs can greatly ease the development of upper-layer
applications, but also makes the conflict resolution notoriously difficult.
This explains why existing CRDC designs are tricky, and hard to be generalized
to other data types. A design framework is in great need to guide the
systematic design of new CRDCs.
To address the challenges above, we propose the Remove-Win Design Framework.
The remove-win strategy for conflict resolution is simple but powerful. The
remove operation just wipes out the data item, no matter how complex the value
is. The user of the CRDC only needs to specify conflict resolution for
non-remove operations. This resolution is destructed to three basic cases and
are left as open terms in the CRDC design skeleton. Stubs containing
user-specified conflict resolution logics are plugged into the skeleton to
obtain concrete CRDC designs. We demonstrate the effectiveness of our design
framework via a case study of designing a conflict-free replicated priority
queue. Performance measurements also show the efficiency of the design derived
from our design framework.Comment: revised after submissio
Shelfaware: Accelerating Collaborative Awareness with Shelf CRDT
Collaboration has become a key feature of modern software, allowing teams to work together effectively in real-time while in different locations. In order for a user to communicate their intention to several distributed peers, computing devices must exchange high-frequency updates with transient metadata like mouse position, text range highlights, and temporary comments. Current peer-to-peer awareness solutions have high time and space complexity due to the ever-expanding logs that each client must maintain in order to ensure robust collaboration in eventually consistent environments. This paper proposes an awareness Conflict-Free Replicated Data Type (CRDT) library that provides the tooling to support an eventually consistent, decentralized, and robust multi-user collaborative environment. Our library is tuned for rapid iterative updates that communicate fine-grained user actions across a network of collaborators. Our approach holds memory constant for subsequent writes to an existing key on a shared resource and completely prunes stale data from shared documents. These features allow us to keep the CRDT\u27s memory footprint small, making it a feasible solution for memory constrained applications. Results show that our CRDT implementation is comparable to or exceeds the performance of similar data structures in high-frequency read/write scenarios
XWiki Concerto: A P2P Wiki System Supporting Disconnected Work
International audienceThis paper presents the XWiki Concerto system, the P2P version of the XWiki server. This system is based on replicating wiki pages on a network of wiki servers. The approach, based on the Woot algorithm, has been designed to be scalable, to support the dynamic aspect of P2P networks and network partitions. These characteristics make our system capable of supporting disconnected edition and sub-groups, making it very flexible and usable
Logoot: A Scalable Optimistic Replication Algorithm for Collaborative Editing on P2P Networks
International audienceMassive collaborative editing becomes a reality through leading projects such as Wikipedia. This massive collaboration is currently supported with a costly central service. In order to avoid such costs, we aim to provide a peer-to- peer collaborative editing system. Existing approaches to build distributed collaborative editing systems either do not scale in terms of number of users or in terms of number of edits. We present the Logoot approach that scales in these both dimensions while ensuring causality, consistency and intention preservation criteria. We evaluate the Logoot approach and compare it to others using a corpus of all the edits applied on a set of the most edited and the biggest pages of Wikipedia
A Comparison of Optimistic Approaches to Collaborative Editing of Wiki Pages
Wikis, a popular tool for sharing knowledge, are basically collaborative editing systems. However, existing wiki systems offer limited support for co-operative authoring, and they do not scale well, because they are based on a centralised architecture. This paper compares the well-known centralised MediaWiki system with several peer-to-peer approaches to editing of wiki pages: an operational transformation approach (MOT2), a commutativity-oriented approach (WOOTO) and a conflict resolution approach (ACF). We evaluate and compare them, according to a number of qualitative and quantitative metrics
The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing
Most existing algorithms for replicated lists, which are widely used in
collaborative text editors, suffer from a problem: when two users concurrently
insert text at the same position in the document, the merged outcome may
interleave the inserted text passages, resulting in corrupted and potentially
unreadable text. The problem has gone unnoticed for decades, and it affects
both CRDTs and Operational Transformation. This paper defines maximal
non-interleaving, our new correctness property for replicated lists. We
introduce two related CRDT algorithms, Fugue and FugueMax, and prove that
FugueMax satisfies maximal non-interleaving. We also implement our algorithms
and demonstrate that Fugue offers performance comparable to state-of-the-art
CRDT libraries for text editing.Comment: 16 pages, 10 figure
Verifying Strong Eventual Consistency in Distributed Systems
Data replication is used in distributed systems to maintain up-to-date copies of shared data across multiple
computers in a network. However, despite decades of research, algorithms for achieving consistency in
replicated systems are still poorly understood. Indeed, many published algorithms have later been shown to
be incorrect, even some that were accompanied by supposed mechanised proofs of correctness. In this work,
we focus on the correctness of Conflict-free Replicated Data Types (CRDTs), a class of algorithm that provides
strong eventual consistency guarantees for replicated data. We develop a modular and reusable framework
in the Isabelle/HOL interactive proof assistant for verifying the correctness of CRDT algorithms. We avoid
correctness issues that have dogged previous mechanised proofs in this area by including a network model
in our formalisation, and proving that our theorems hold in all possible network behaviours. Our axiomatic
network model is a standard abstraction that accurately reflects the behaviour of real-world computer networks.
Moreover, we identify an abstract convergence theorem, a property of order relations, which provides a formal
definition of strong eventual consistency. We then obtain the first machine-checked correctness theorems for
three concrete CRDTs: the Replicated Growable Array, the Observed-Remove Set, and an Increment-Decrement
Counter. We find that our framework is highly reusable, developing proofs of correctness for the latter two
CRDTs in a few hours and with relatively little CRDT-specific code
Key-CRDT stores
Dissertação para obtenção do Grau de Mestre em
Engenharia InformĂĄticaThe Internet has opened opportunities to create world scale services. These systems require highavailability and fault tolerance, while preserving low latency. Replication is a widely adopted technique to provide these properties. Different replication techniques have been proposed through the years, but to support these properties for world scale services it is necessary to trade consistency for availability, fault-tolerance and low latency. In weak consistency models, it is necessary to deal with possible conflicts arising from concurrent updates. We propose the use of conflict free replicated data types (CRDTs) to address this issue.
Cloud computing systems support world scale services, often relying on Key-Value stores for storing data. These systems partition and replicate data over multiple nodes, that can be geographically disperse over the network. For handling conflict, these systems either rely on solutions that lose updates (e.g. last-write-wins) or require application to handle concurrent updates. Additionally, these systems provide little support for transactions, a widely used abstraction for data access.
In this dissertation, we present the design and implementation of SwiftCloud, a Key-CRDT
store that extends a Key-Value store by incorporating CRDTs in the systemâs data-model. The system provides automatic conflict resolution relying on properties of CRDTs. We also present a version of SwiftCloud that supports transactions. Unlike traditional transactional systems, transactions never abort due to write/write conflicts, as the system leverages CRDT properties to merge concurrent transactions. For implementing SwiftCloud, we have introduced a set of new techniques, including versioned CRDTs, composition of CRDTs and alternative serialization methods.
The evaluation of the system, with both micro-benchmarks and the TPC-W benchmark, shows that SwiftCloud imposes little overhead over a key-value store. Allowing clients to access a datacenter close to them with SwiftCloud, can reduce latency without requiring any complex reconciliation
mechanism. The experience of using SwiftCloud has shown that adapting an existing application to use SwiftCloud requires low effort.Project PTDC/EIA-EIA/108963/200