109 research outputs found

    A Generic Undo Support for State-Based CRDTs

    Get PDF
    CRDTs (Conflict-free Replicated Data Types) have properties desirable for large-scale distributed systems with variable network latency or transient partitions. With CRDT, data are always available for local updates and data states converge when the replicas have incorporated the same updates. Undo is useful for correcting human mistakes and for restoring system-wide invariant violated due to long delays or network partitions. There is currently no generally applicable undo support for CRDTs. There are at least two reasons for this. First, there is currently no abstraction that we can practically use to capture the relations between undo and normal operations with respect to concurrency and causality. Second, using inverse operations as the existing partial solutions, the CRDT designer has to hard-code certain rules and design a new CRDT for almost every operation that needs undo support. In this paper, we present an approach to generic support of undo for CRDTs. The approach consists of two major parts. We first work out an abstraction that captures the semantics of concurrent undo and redo operations through equivalence classes. The abstraction is a natural extension of undo and redo in sequential applications and is straightforward to implement in practice. By using this abstraction, we then device a mechanism to augment existing CRDTs. The mechanism provides an "out of the box" support for undo without the involvement of the CRDT designers. We also present a practical application of the approach in collaborative editing

    Keynote: From groupware to large-scale trustworthy distributed collaborative systems

    Get PDF
    International audienceDistributed collaborative systems allow users to collaborate on a set of shared documents from any place, at any time and from any device. Examples of collaborative systems are wikis, version control systems or GoogleDrive. While 30 years ago when these collaborative systems were firstly developed they were used in scenarios involving only a small set of users such as for the writing of a research article, nowadays we notice a change in the scale from several users to a community of users. The large-scale collaboration is now possible due to advances in mobile and ubiquitous communication that enable users to be continuously connected and to the appropriation of existing tools by the users. However, existing collaborative systems face several challenges including privacy issues as personal user information is placed in the hands of large corporations and users have little control over the usage of their data, performance and coordination issues in the large-scale context. In this talk we will illustrate the evolution of collaborative systems in the past years and describe our vision of trustworthy distributed collaborative systems where communities of users can safely and confidently collaborate without the use of a central authority. We will focus on envisaged solutions for replicated data consistency, security, trust and awareness in this context. As human factor is a key issue in the design of trustworthy distributed collaborative systems, we call for the need of evaluation of these systems with user studies

    Authenticating Operation-based History in Collaborative Systems

    Get PDF
    International audienceWithin last years multi-synchronous collaborative editing systems became widely used. Multi-synchronous collaboration maintains multiple, simultaneous streams of activity which continually diverge and synchronized. These streams of activity are represented by means of logs of operations, i.e. user modifications. A malicious user might tamper his log of operations. At the moment of synchronization with other streams, the tampered log might generate wrong results. In this paper, we propose a solution relying on hash-chain based authenticators for authenticating logs that ensure the authenticity, the integrity of logs, and the user accountability. We present algorithms to construct authenticators and verify logs. We prove their correctness and provide theoretical and practical evaluations

    Peer-to-peer Collaboration over XML Documents

    Get PDF
    International audienceExisting solutions for the collaboration over XML documents are limited to a centralised architecture. In this paper we propose an approach for peer-to-peer collaboration over XML documents where users can work off-line on their document replica and synchronise in an ad-hoc manner with other users. Our algorithm for maintaining consistency over XML documents recursively applies the tombstone operational transformation approach over the document levels

    Hybrid Weighting Schemes For Collaborative Filtering

    Get PDF
    Neighborhood based algorithms are one of the most common approaches to Collaborative Filtering (CF). The core element of these algorithms is similarity computation between items or users. It is reasonable to assume that some ratings of a user bear more information than others. Weighting the ratings proportional to their importance is known as feature weighting. Nevertheless in practice, none of the existing weighting schemes results in significant improvement to the quality of recommendations. In this paper, we suggest a new weighting scheme based on Matrix Factorization (MF). In our scheme, the importance of each rating is estimated by comparing the coordinates of users (items) taken from a latent feature space computed through Matrix Factorization (MF). Moreover, we review the effect of a large number of weighting schemes on item based and user based algorithms. The effect of various influential parameters is studied running extensive simulations on two versions of the Movielens dataset. We will show that, unlike the existing weighting schemes, ours can improve the performance of CF algorithms. Furthermore, their cascading capitalizes on each other's improvement

    Rapid and Round-free Multi-pair Asynchronous Push-Pull Aggregation

    Get PDF
    As various distributed algorithms and services demand overall information on large scale networks, the protocols that aggregate data over networks are essential, and the quality of aggregations determines the quality of those distributed algorithms and services. Though a variety of aggregation protocols have been proposed, gossip-based iterative aggregations have outstanding advantages especially in accuracy, result distribution, topology-independence, and resilience to network churns. However, most of iterative aggregations, especially push-pull style aggregations, suffer from two synchronization constraints: synchronized rounds and synchronized communication. Namely, iterative protocols generally need prior configurations to synchronize rounds over all nodes, and messages should be exchanged in a synchronous way in order to ensure accurate estimates in push-pull or push-sum protocols. This paper proposes multi-pair asynchronous push-pull aggregation (MAPPA), which liberates the push-pull aggregations from the synchronization constraints, and pursues a way to accelerate the aggregation speed. MAPPA considerably reduces aggregation times, and shows an improvement in fault-tolerance. Thanks to topology independence, inherent from gossip mechanisms, and its rapidness, MAPPA is resilient to network churns, and thus suitable for dynamic networks

    Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge

    Get PDF
    International audienceIn a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is off-line, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results

    An Analysis of Merge Conflicts and Resolutions in Git-based Open Source Projects

    Get PDF
    International audienceVersion control systems such as Git support parallel collaborative work and became very widespread in the open-source community. While Git offers some very interesting features, resolving conflicts that arise during synchronization of parallel changes is a time-consuming task. In this paper we present an analysis of concurrency and conflicts in official Git repository of four projects: Rails, IkiWiki, Samba and Linux Kernel. We analyse the collaboration process of these projects at specific periods revealing how change integration and conflict rates vary during project development life-cycle. We also analyse how often users decide to rollback to previous document version when the integration process generates conflicts. Finally, we discuss the mechanism adopted by Git to consider changes made on two continuous lines as conflicting

    An end-to-end learning solution for assessing the quality of Wikipedia articles

    Get PDF
    International audienceWikipedia is considered as the largest knowledge repository in the history of humanity and plays a crucial role in modern daily life. Assigning the correct quality class to Wikipedia articles is an important task in order to provide guidance for both authors and readers of Wikipedia. Manual review cannot cope with the editing speed of Wikipedia. An automatic classification is required to classify quality of Wikipedia articles. Most existing approaches rely on traditional machine learning with manual feature engineering, which requires a lot of expertise and effort. Furthermore, it is known that there is no general perfect feature set, because information leak always occurs in feature extraction phase. Also, for each language of Wikipedia a new feature set is required. In this paper, we present an approach relying on deep learning for quality classification of Wikipedia articles. Our solution relies on Recurrent Neural Networks (RNN) which is an end-to-end learning technique that eliminates disadvantages of feature engineering. Our approach learns directly from raw data without human intervention and is language-neutral.Experimental results on English, French and Russian Wikipedia datasets show that our approach outperforms state-of-the-art solutions

    Measuring Quality of Collaboratively Edited Documents: the case of Wikipedia

    Get PDF
    International audienceWikipedia is a great example of large scale collaboration , where people from all over the world together build the largest and maybe the most important human knowledge repository in the history. However, a number of studies showed that the quality of Wikipedia articles is not equally distributed. While many articles are of good quality, many others need to be improved. Assessing the quality of Wikipedia articles is very important for guiding readers towards articles of high quality and suggesting authors and reviewers which articles need to be improved. Due to the huge size of Wikipedia, an effective automatic assessment method to measure Wikipedia articles quality is needed. In this paper, we present an automatic assessment method of Wikipedia articles quality by analyzing their content in terms of their format features and readability scores. Our results show improvements both in terms of accuracy and information gain compared with other existing approaches
    • …
    corecore