37 research outputs found

    Conflict-Free Replicated Data Types in Dynamic Environments

    Get PDF
    Over the years, mobile devices have become increasingly popular and gained improved computation capabilities allowing them to perform more complex tasks such as collaborative applications. Given the weak characteristic properties of mobile networks, which represent highly dynamic environments where users may experience regular involuntary disconnection periods, the big question arises of how to maintain data consistency. This issue is most pronounced in collaborative environments where multiple users interact with each other, sharing a replicated state that may diverge due to concurrency conflicts and loss of updates. To maintain consistency, one of today’s best solutions is Conflict-Free Replicated Data Types (CRDTs), which ensure low latency values and automatic conflict resolution, guaranteeing eventual consistency of the shared data. However, a limitation often found on CRDTs and the systems that employ them is the need for the knowledge of the replicas whom the state changes must be disseminated to. This constitutes a problem since it is inconceivable to maintain said knowledge in an environment where clients may leave and join at any given time and consequently get disconnected due to mobile network communications unreliability. In this thesis, we present the study and extension of the CRDT concept to dynamic environments by introducing the developed P/S-CRDTs model, where CRDTs are coupled with the publisher/subscriber interaction scheme and additional mechanisms to ensure users are able to cooperate and maintain consistency whilst accounting for the consequent volatile behaviors of mobile networks. The experimental results show that in volatile scenarios of disconnection, mobile users in collaborative activity maintain consistency among themselves and when compared to other available CRDT models, the P/S-CRDTs model is able to decouple the required knowledge of whom the updates must be disseminated to, while ensuring appropriate network traffic values

    Cloud-edge hybrid applications

    Get PDF
    Many modern applications are designed to provide interactions among users, including multi- user games, social networks and collaborative tools. Users expect application response time to be in the order of milliseconds, to foster interaction and interactivity. The design of these applications typically adopts a client-server model, where all interac- tions are mediated by a centralized component. This approach introduces availability and fault- tolerance issues, which can be mitigated by replicating the server component, and even relying on geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server communication model leads to unnecessary latency penalties for geographically close clients and high operational costs for the application provider. This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation and consistency mechanisms. This model combines client-side replication and client-to-client propagation for providing low latency and minimizing the dependency on the server infras- tructure, fostering availability and fault tolerance. To realize this model, this works makes the following key contributions. First, the cloud-edge hybrid model is materialized by a system design where clients maintain replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist clients’ operation. We study how to bring most of the application logic to the client-side, us- ing the centralized service primarily for durability, access control, discovery, and overcoming internetwork limitations. Second, we dene protocols for weakly consistent data replication, including a novel CRDT model (∆-CRDTs). We provide a study on partial replication, exploring the challenges and fundamental limitations in providing causal consistency, and the diculty in supporting client- side replicas due to their ephemeral nature. Third, we study how client misbehaviour can impact the guarantees of causal consistency. We propose new secure weak consistency models for insecure settings, and algorithms to enforce such consistency models. The experimental evaluation of our contributions have shown their specic benets and limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected from the server, and reduced server bandwidth usage. In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la- tency, availability under server disconnections, and improved server scalability – while being ecient, reliable, and secure.Muitas aplicações modernas são criadas para fornecer interações entre utilizadores, incluindo jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o tempo de resposta nas aplicações seja da ordem de milissegundos, promovendo a interação e interatividade. A arquitetura dessas aplicações normalmente adota um modelo cliente-servidor, onde todas as interações são mediadas por um componente centralizado. Essa abordagem apresenta problemas de disponibilidade e tolerância a falhas, que podem ser mitigadas com replicação no componente do servidor, até com a utilização de soluções replicadas geogracamente em infraestruturas de computação na nuvem. Mesmo neste caso, o modelo de comunicação cliente-servidor leva a penalidades de latência desnecessárias para clientes geogracamente próximos e altos custos operacionais para o provedor das aplicações. Esta dissertação propõe um modelo híbrido cloud-edge com mecanismos seguros e ecientes de propagação e consistência. Esse modelo combina replicação do lado do cliente e propagação de cliente para cliente para fornecer baixa latência e minimizar a dependência na infraestrutura do servidor, promovendo a disponibilidade e tolerância a falhas. Para realizar este modelo, este trabalho faz as seguintes contribuições principais. Primeiro, o modelo híbrido cloud-edge é materializado por uma arquitetura do sistema em que os clientes mantêm réplicas dos dados e sincronizam de maneira ponto a ponto e onde os servidores são usados para auxiliar na operação dos clientes. Estudamos como trazer a maior parte da lógica das aplicações para o lado do cliente, usando o serviço centralizado principalmente para durabilidade, controlo de acesso, descoberta e superação das limitações inter-rede. Em segundo lugar, denimos protocolos para replicação de dados fracamente consistentes, incluindo um novo modelo de CRDTs (∆-CRDTs). Fornecemos um estudo sobre replicação parcial, explorando os desaos e limitações fundamentais em fornecer consistência causal e a diculdade em suportar réplicas do lado do cliente devido à sua natureza efémera. Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias da consistência causal. Propomos novos modelos seguros de consistência fraca para congurações inseguras e algoritmos para impor tais modelos de consistência. A avaliação experimental das nossas contribuições mostrou os benefícios e limitações em comparação com o estado da arte. Em geral, o modelo híbrido cloud-edge leva a tempos de resposta nas aplicações mais rápidos, a uma menor latência de cliente para cliente e à possibilidade de trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo e devido à redução na utilização da largura de banda no servidor. Em resumo, propomos um modelo híbrido entre a orla (edge) e a nuvem (cloud) que fornece menor latência entre utilizadores, disponibilidade durante desconexões do servidor e uma melhor escalabilidade do servidor – ao mesmo tempo que é eciente, conável e seguro

    VeriFx: Correct Replicated Data Types for the Masses

    Get PDF
    Distributed systems adopt weak consistency to ensure high availability and low latency, but state convergence is hard to guarantee due to conflicts. Experts carefully design replicated data types (RDTs) that resemble sequential data types and embed conflict resolution mechanisms that ensure convergence. Designing RDTs is challenging as their correctness depends on subtleties such as the ordering of concurrent operations. Currently, researchers manually verify RDTs, either by paper proofs or using proof assistants. Unfortunately, paper proofs are subject to reasoning flaws and mechanized proofs verify a formalization instead of a real-world implementation. Furthermore, writing mechanized proofs is reserved for verification experts and is extremely time-consuming. To simplify the design, implementation, and verification of RDTs, we propose VeriFx, a specialized programming language for RDTs with automated proof capabilities. VeriFx lets programmers implement RDTs atop functional collections and express correctness properties that are verified automatically. Verified RDTs can be transpiled to mainstream languages (currently Scala and JavaScript). VeriFx provides libraries for implementing and verifying Conflict-free Replicated Data Types (CRDTs) and Operational Transformation (OT) functions. These libraries implement the general execution model of those approaches and define their correctness properties. We use the libraries to implement and verify an extensive portfolio of 51 CRDTs, 16 of which are used in industrial databases, and reproduce a study on the correctness of OT functions

    Verifying and Enforcing Application Constraints in Antidote SQL

    Get PDF
    Geo-replicated storage systems are currently a fundamental piece in the development of large-scale applications where users are distributed across the world. To meet the high requirements regarding la- tency and availability of these applications, these database systems are forced to use weak consistency mechanisms. However, under these consistency models, there is no guarantee that the invariants are preserved, which can jeopardise the correctness of applications. The most obvious alternative to solve this problem would be to use strong consistency, but this would place a large burden on the system. Since neither of these options was feasible, many systems have been developed to preserve the invariants of the applications without sacrificing low latency and high availability. These systems, based on the analysis of operations, make it possible to increase the guarantees of weak consistency by introducing consistency at the level of operations that are potentially dangerous to the invariant. Antidote SQL is a database system that, by combining strong with weak consistency mechanisms, attempts to guarantee the preservation of invariants at the data level. In this way, and after defining the concurrency semantics for the application, any operation can be performed without coordination and without the risk of violating the invariant. However, this approach has some limitations, namely the fact that it is not trivial for developers to define appropriate concurrency semantics. In this document, we propose a methodology for the verification and validation of defined prop- erties, such as invariants, for applications using Antidote SQL. The proposed methodology uses a high-level programming language with automatic verification features called VeriFx and provides guidelines for programmers who wish to implement and verify their own systems and specifications using this tool.Os sistemas de armazenamento geo-replicados são atualmente uma peça fundamental no desenvolvi- mento de aplicações de grande escala em que os utilizadores se encontram espalhados pelo mundo. Com o objetivo de satisfazer os elevados requisitos em relação à latência e à disponibilidade destas aplicações, estes sistemas de bases de dados vêem-se obrigados a recorrer a mecanismos de consistên- cia fracos. No entanto, sob estes modelos de consistência não existe qualquer tipo de garantia de que os invariantes são preservados, o que pode colocar em causa a correção das aplicações. A alternativa mais óbvia para resolver este problema passaria por utilizar consistência forte, no entanto esta incutiria uma grande sobrecarga no sistema. Sendo que nenhuma destas opções é viável, muitos sistemas foram desenvolvidos no sentido de preservar os invariantes das aplicações, sem contudo, abdicar de baixas latências e alta disponibilidade. Estes sistemas, baseados na análise das operações, permitem aumentar as garantias de consistência fraca com a introdução de consistência ao nível das operações potencialmente perigosas para o invari- ante. O Antidote SQL é um sistema de base de dados que através da combinação de mecanismos de consistência fortes com mecanismos de consistência fracos tenta garantir a preservação dos invariantes ao nível dos dados. Desta forma, e depois de definidas as semânticas de concorrência para a aplicação, qualquer operação pode ser executada sem coordenação e sem perigo de quebra do invariante. No entanto esta abordagem apresenta algumas limitações nomeadamente o facto de não ser trivial para os programadores definirem as semânticas de concorrência adequadas. Neste documento propomos uma metodologia para a verificação e validação de propriedades defi- nidas, como os invariantes, para aplicações que usam o Antidote SQL. A metodologia proposta utiliza uma linguagem de programação de alto nível com capacidade de verificação automática designada por VeriFx, e fornece as diretrizes a seguir para que o programador consiga implementar e verificar os seus próprios sistemas e especificações, utilizando a ferramenta

    Building fast and consistent (geo-)replicated systems : from principles to practice

    Get PDF
    Distributing data across replicas within a data center or across multiple data centers plays an important role in building Internet-scale services that provide a good user experience, namely low latency access and high throughput. This approach often compromises on strong consistency semantics, which helps maintain application-specific desired properties, namely, state convergence and invariant preservation. To relieve such inherent tension, in the past few years, many proposals have been designed to allow programmers to selectively weaken consistency levels of certain operations to avoid costly immediate coordination for concurrent user requests. However, these fail to provide principles to guide programmers to make a correct decision of assigning consistency levels to various operations so that good performance is extracted while the system behavior still complies with its specification. The primary goal of this thesis work is to provide programmers with principles and tools for building fast and consistent (geo-) replicated systems by allowing programmers to think about various consistency levels in the same framework. The first step we took was to propose RedBlue consistency, which presents sufficient conditions that allow programmers to safely separate weakly consistent operations from strongly consistent ones in a coarse-grained manner. Second, to improve the practicality of RedBlue consistency, we built SIEVE - a tool that explores both Commutative Replicated Data Types and program analysis techniques to assign proper consistency levels to different operations and to maximize the weakly consistent operation space. Finally, we generalized the tradeoff between consistency and performance and proposed Partial Order-Restrictions consistency (or short, PoR consistency) - a generic consistency definition that captures various consistency levels in terms of visibility restrictions among pairs of operations and allows programmers to tune the restrictions to obtain a fine-grained control of their targeted consistency semantics.Daten auf mehrere Repliken in einem Datenzentrum oder über mehrere Datenzentren zu verteilen, nimmt einen hohen Stellenwert ein, um Internet-weite Services mit guter Nutzererfahrung, nsbesondere mit niedrigen Zugriffszeiten und hohem Datendurchsatz, zu implementieren. Diese Methode beeinträchtigt in der Regel die starke Konsitenzsemantik, die hilft gewünschte anwendungsspezifische Eigenschaften, die Zustandskonvergenz und Erhaltung von Invarianten, aufrechtzuerhalten. Um diesen Kompromiss zu mildern, wurde in den letzten Jahren mehrere Vorschläge entworfen, die es dem Programmierer ermöglichen für einzelne Operationen ein schwächeres Konsitenzlevel auszuwählen, um der aufwendigen Koordination paralleler Benutzeranfragen zu entgehen. Allerdings liefern diese Leitsätze für die Programmierer keine Lösungsansätze, wann welches Konsistenzlevel für eine Operation anzuwenden ist, so dass die höchstmögliche Leistung erreicht wird und gleichzeitig die Handlung des Systems die Spezifikation erfüllen. Das Hauptziel dieser Doktorarbeit ist es Leitsätzen und Werkzeuge für Programmierer bereitzustellen, die die Entwicklung von leistungsstarken, konsistenten und (weltweit) replizierten Sytemen ermöglichen, in dem dem Programmierer mit Hilfe eines Frameworks gleichzeitig zwischen verschiedenen Konsistenzlevel wählen kann. Als ersten Schritt entwickelten wir RedBlue Konsistenz, welches die hinreichende Bedingungen erläutert, die es einem Programmierer erlauben zwischen schwacher Konsistenz und starker Konsistenz zu wählen. Um die Praktikabilität von RedBlue Konsistenz im zweiten Schritt weiter zu erhöhen, entwickelten wir SIEVE - ein Werkzeug, das sowohl kommutative, replizierte Datentypen und Programmanalyseverfahren verwendet, um den richtigen Konsistenzlevel zu verschiedenen Operationen zuzuordnen und dabei die schwach konsistenten Operationen zu maximieren. Abschliessend verallgemeinern wir den Kompromiss zwischen Konsistenz und Leistungsstärke und stellen die partiell, eingeschränkt geordnete Konsistenz vor (PoR Konsistenz) - eine generische Konsistenzdefinition, die verschiedene Konsistenz level, hinsichtlich der Einschränkung der Sichtbarkeit zwischen paaren von Operationen, umfasst und dem Programmierer erlaubt, die Einschränkungen zu justieren, um die gewünschte Konsistenzsemantik zu erzielen

    Estimating data divergence in cloud computing storage systems

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaMany internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 30773

    D.1.3 – Protocols for emergent localities

    Get PDF
    GDD_HCERES2020This report presents two contributions that illustrate the potential of emerging-locality protocols in large-scale decentralized systems, in two areas of decentralized social computing: recommendation, and eventual consistency of mutable data structures. The first contribution consists of a framework supporting the development of dynamically adaptive decen-tralised recommendation systems. Decentralised recommenders have been proposed to deliver privacy-preserving, personalised and highly scalable on-line recommendations. Current implementations tend, however, to rely on a hard-wired similarity metric that cannot adapt. This constitutes a strong limitation in the face of evolving needs. Our framework address this through a decentralised form of adaptation, in which individual nodes can independently select, and update their own recommendation algorithm, while still collectively contributing to the overall system's mission. Our second contribution addresses the growing demand for differentiated consistency requirements in large-scale applications. A large number of today's applications rely on Eventual Consistency, a consistency model that emphasizes liveness over safety. Designers generally adopt this consistency model uniformly throughout a distributed system due to its ability to scale as the number of users or devices grows larger. But this clashes with the need for differentiated consistency requirements. In this contribution, we address this need by introducing UPS, a novel consistency mechanism that offers differentiated eventual consistency and delivery speed by working in pair with a two-phase epidemic broadcast protocol. We propose a closed-form analysis of our approach's delivery speed, and we evaluate our complete protocol experimentally on a simulated network of one million nodes. To measure the consistency trade-off, we formally define a novel and scalable consistency metric operating at runtime

    Techniques intelligentes pour la gestion de la cohérence des Big data dans le cloud

    Get PDF
    Cette thèse aborde le problème de cohérence des données de Bigdata dans le cloud. En effet, nos recherches portent sur l’étude de différentes approches de cohérence adaptative dans le cloud et la proposition d’une nouvelle approche pour l’environnement Edge computing. La gestion de la cohérence a des conséquences majeures pour les systèmes de stockage distribués. Les modèles de cohérence forte nécessitent une synchronisation après chaque mise à jour, ce qui affecte considérablement les performances et la disponibilité du système. À l’inverse, les modèles à faible cohérence offrent de meilleures performances ainsi qu’une meilleure disponibilité des données. Cependant, ces derniers modèles peuvent tolérer trop d’incohérences temporaires sous certaines conditions. Par conséquent, une stratégie de cohérence adaptative est nécessaire pour ajuster, pendant l’exécution, le niveau de cohérence en fonction de la criticité des requêtes ou des données. Cette thèse apporte deux contributions. Dans la première contribution, une analyse comparative des approches de cohérence adaptative existantes est effectuée selon un ensemble de critères de comparaison définis. Ce type de synthèse fournit à l’utilisateur/chercheur une analyse comparative des performances des approches existantes. De plus, il clarifie la pertinence de ces approches pour les systèmes cloud candidats. Dans la seconde contribution, nous proposons MinidoteACE, un nouveau système adaptatif de cohérence qui est une version améliorée de Minidote, un système de cohérence causale pour les applications Edge. Contrairement à Minidote qui ne fournit que la cohérence causale, notre modèle permet aux applications d’exécuter également des requêtes avec des garanties de cohérence plus fortes. Des évaluations expérimentales montrent que le débit ne diminue que de 3,5 % à 10 % lors du remplacement d’une opération causale par une opération forte. Cependant, la latence de mise à jour augmente considérablement pour les opérations fortes jusqu’à trois fois pour une charge de travail où le taux des opérations de mise à jour est de 25 %
    corecore