74 research outputs found

    A Generic Undo Support for State-Based CRDTs

    Get PDF
    CRDTs (Conflict-free Replicated Data Types) have properties desirable for large-scale distributed systems with variable network latency or transient partitions. With CRDT, data are always available for local updates and data states converge when the replicas have incorporated the same updates. Undo is useful for correcting human mistakes and for restoring system-wide invariant violated due to long delays or network partitions. There is currently no generally applicable undo support for CRDTs. There are at least two reasons for this. First, there is currently no abstraction that we can practically use to capture the relations between undo and normal operations with respect to concurrency and causality. Second, using inverse operations as the existing partial solutions, the CRDT designer has to hard-code certain rules and design a new CRDT for almost every operation that needs undo support. In this paper, we present an approach to generic support of undo for CRDTs. The approach consists of two major parts. We first work out an abstraction that captures the semantics of concurrent undo and redo operations through equivalence classes. The abstraction is a natural extension of undo and redo in sequential applications and is straightforward to implement in practice. By using this abstraction, we then device a mechanism to augment existing CRDTs. The mechanism provides an "out of the box" support for undo without the involvement of the CRDT designers. We also present a practical application of the approach in collaborative editing

    Keynote: From groupware to large-scale trustworthy distributed collaborative systems

    Get PDF
    International audienceDistributed collaborative systems allow users to collaborate on a set of shared documents from any place, at any time and from any device. Examples of collaborative systems are wikis, version control systems or GoogleDrive. While 30 years ago when these collaborative systems were firstly developed they were used in scenarios involving only a small set of users such as for the writing of a research article, nowadays we notice a change in the scale from several users to a community of users. The large-scale collaboration is now possible due to advances in mobile and ubiquitous communication that enable users to be continuously connected and to the appropriation of existing tools by the users. However, existing collaborative systems face several challenges including privacy issues as personal user information is placed in the hands of large corporations and users have little control over the usage of their data, performance and coordination issues in the large-scale context. In this talk we will illustrate the evolution of collaborative systems in the past years and describe our vision of trustworthy distributed collaborative systems where communities of users can safely and confidently collaborate without the use of a central authority. We will focus on envisaged solutions for replicated data consistency, security, trust and awareness in this context. As human factor is a key issue in the design of trustworthy distributed collaborative systems, we call for the need of evaluation of these systems with user studies

    Peer-to-peer Collaboration over XML Documents

    Get PDF
    International audienceExisting solutions for the collaboration over XML documents are limited to a centralised architecture. In this paper we propose an approach for peer-to-peer collaboration over XML documents where users can work off-line on their document replica and synchronise in an ad-hoc manner with other users. Our algorithm for maintaining consistency over XML documents recursively applies the tombstone operational transformation approach over the document levels

    Hybrid Weighting Schemes For Collaborative Filtering

    Get PDF
    Neighborhood based algorithms are one of the most common approaches to Collaborative Filtering (CF). The core element of these algorithms is similarity computation between items or users. It is reasonable to assume that some ratings of a user bear more information than others. Weighting the ratings proportional to their importance is known as feature weighting. Nevertheless in practice, none of the existing weighting schemes results in significant improvement to the quality of recommendations. In this paper, we suggest a new weighting scheme based on Matrix Factorization (MF). In our scheme, the importance of each rating is estimated by comparing the coordinates of users (items) taken from a latent feature space computed through Matrix Factorization (MF). Moreover, we review the effect of a large number of weighting schemes on item based and user based algorithms. The effect of various influential parameters is studied running extensive simulations on two versions of the Movielens dataset. We will show that, unlike the existing weighting schemes, ours can improve the performance of CF algorithms. Furthermore, their cascading capitalizes on each other's improvement

    Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge

    Get PDF
    International audienceIn a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is off-line, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results

    Authenticating Operation-based History in Collaborative Systems

    Get PDF
    International audienceWithin last years multi-synchronous collaborative editing systems became widely used. Multi-synchronous collaboration maintains multiple, simultaneous streams of activity which continually diverge and synchronized. These streams of activity are represented by means of logs of operations, i.e. user modifications. A malicious user might tamper his log of operations. At the moment of synchronization with other streams, the tampered log might generate wrong results. In this paper, we propose a solution relying on hash-chain based authenticators for authenticating logs that ensure the authenticity, the integrity of logs, and the user accountability. We present algorithms to construct authenticators and verify logs. We prove their correctness and provide theoretical and practical evaluations

    An end-to-end learning solution for assessing the quality of Wikipedia articles

    Get PDF
    International audienceWikipedia is considered as the largest knowledge repository in the history of humanity and plays a crucial role in modern daily life. Assigning the correct quality class to Wikipedia articles is an important task in order to provide guidance for both authors and readers of Wikipedia. Manual review cannot cope with the editing speed of Wikipedia. An automatic classification is required to classify quality of Wikipedia articles. Most existing approaches rely on traditional machine learning with manual feature engineering, which requires a lot of expertise and effort. Furthermore, it is known that there is no general perfect feature set, because information leak always occurs in feature extraction phase. Also, for each language of Wikipedia a new feature set is required. In this paper, we present an approach relying on deep learning for quality classification of Wikipedia articles. Our solution relies on Recurrent Neural Networks (RNN) which is an end-to-end learning technique that eliminates disadvantages of feature engineering. Our approach learns directly from raw data without human intervention and is language-neutral.Experimental results on English, French and Russian Wikipedia datasets show that our approach outperforms state-of-the-art solutions

    Measuring Quality of Collaboratively Edited Documents: the case of Wikipedia

    Get PDF
    International audienceWikipedia is a great example of large scale collaboration , where people from all over the world together build the largest and maybe the most important human knowledge repository in the history. However, a number of studies showed that the quality of Wikipedia articles is not equally distributed. While many articles are of good quality, many others need to be improved. Assessing the quality of Wikipedia articles is very important for guiding readers towards articles of high quality and suggesting authors and reviewers which articles need to be improved. Due to the huge size of Wikipedia, an effective automatic assessment method to measure Wikipedia articles quality is needed. In this paper, we present an automatic assessment method of Wikipedia articles quality by analyzing their content in terms of their format features and readability scores. Our results show improvements both in terms of accuracy and information gain compared with other existing approaches

    An Analysis of Merge Conflicts and Resolutions in Git-based Open Source Projects

    Get PDF
    International audienceVersion control systems such as Git support parallel collaborative work and became very widespread in the open-source community. While Git offers some very interesting features, resolving conflicts that arise during synchronization of parallel changes is a time-consuming task. In this paper we present an analysis of concurrency and conflicts in official Git repository of four projects: Rails, IkiWiki, Samba and Linux Kernel. We analyse the collaboration process of these projects at specific periods revealing how change integration and conflict rates vary during project development life-cycle. We also analyse how often users decide to rollback to previous document version when the integration process generates conflicts. Finally, we discuss the mechanism adopted by Git to consider changes made on two continuous lines as conflicting

    Computational Trust Model for Repeated Trust Games

    Get PDF
    International audienceTrust game is a money exchange game that has been widely used in behavioral economics for studying trust and collaboration between humans. In this game, exchange of money is entirely attributable to the existence of trust between users. The trust game could be one-shot, i.e. the game ends after one round of money exchange, or repeated, i.e. it lasts several rounds. Predicting user behavior in the repeated trust game is of critical importance for the next movement of the partners. However, existing behavior prediction approaches uniquely rely on players personal information such as their age, gender and income and do not consider their past behavior in the game. In this paper, we propose a computational trust metric that is uniquely based on users past behavior and can predict the future behavior in repeated trust game. Our trust metric can distinguish between users having different behavioral profiles and is resistant to fluctuating user behavior. We validate our model by using an empirical approach against data sets collected from several trust game experiments. We show that our model is consistent with rating opinions of users, and our model can provide higher accuracy on predicting users' behavior compared with other naive models
    • …
    corecore