7 research outputs found

    Scalable and accurate causality tracking for eventually consistent stores

    Get PDF
    Lecture Notes in Computer Science 8460, 2014In cloud computing environments, data storage systems often rely on optimistic replication to provide good performance and availability even in the presence of failures or network partitions. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. Current approaches to causality tracking in optimistic replication have problems with concurrent updates: they either (1) do not scale, as they require replicas to maintain information that grows linearly with the number of writes or unique clients; (2) lose information about causality, either by removing entries from client-id based version vectors or using server-id based version vectors, which cause false conflicts. We propose a new logical clock mechanism and a logical clock framework that together support a traditional key-value store API, while capturing causality in an accurate and scalable way, avoiding false conflicts. It maintains concise information per data replica, only linear on the number of replica servers, and allows data replicas to be compared and merged linear with the number of replica servers and versions.(undefined

    Why logical clocks are easy

    Get PDF
    Tracking causality should not be ignored. It is important in the design of many distributed algorithms. And not respecting causality can lead to strange behaviors for users. The most commonly used mechanisms for tracking causality, vector clocks and version vectors, are simply optimized representations of causal histories, which are easy to understand. By building on the notion of causal histories, users can begin to see the logic ehind these mechanisms, to identify how they differ, and even consider possible optimizations. When confronted with an unfamiliar causality tracking mechanism, or when trying to design a new system that requires it, readers should ask two simple questions, which events need tracking and how does the mechanism translate back to a simple causal history.We would like to thank Rodrigo Rodrigues, Marc Shapiro, Russell Brown, Sean Cribbs, and Justin Sheehy for their feedback. This work was partially supported by EU FP7 SyncFree project (609551) and FCT/MCT projects UID/CEC/04516/2013 and UID/EEA/50014/2013.info:eu-repo/semantics/publishedVersio

    Towards Improved Support for Conflict-Free Replicated Data Types

    Get PDF
    Conflict-free Replicated Data Types (CRDTs) are distributed data types which ensure Strong Eventual Consistency (SEL), and also has properties such as commutativity and idempotence. There are many variations of CRDTs, and the ones we will study are state-based delta CRDTs. State-based CRDTs is a variation where the CRDT instances synchronize by sending their state to each other. An improvement to this is delta CRDTs which is yet another variation where only the difference of mutations is disseminated. We will explore some of the existing CRDTs, but also present some new CRDT designs and their implementations. One of the new CRDTs presented is Causal Length Set (CLSet), which is a simple, yet effective state-based delta CRDT. It does not use dots as causal context as many other CRDTs; it merely uses a single integer. With this minimalist CRDT design, we can achieve both great performance and a small memory footprint. Another CRDT presented is Multiple Value Map (MVMap), a CRDT which focuses on convenience as opposed to performance

    Dependable eventual consistency with replicated data types

    Get PDF
    Eventually consistent replicated databases offer excellent responsiveness and fault-tolerance, but expose applications to the complexity of concurrency andfailures. Recent databases encapsulate these problems behind a stronger interface, supporting causal consistency, which protects the application from orderinganomalies, and/or Replicated Data Types (RDTs), which ensure convergent semantics of concurrent updates using object interface. However, dependable algorithms for RDT and causal consistency come at a cost in metadata size. This thesis studies the design of such algorithms with minimized metadata, and the limits of the design space. Our first contribution is a study of metadata complexity of RDTs. RDTs use metadata to provide rich semantics; many existing RDT implementations incur high overhead in storage space. We design optimized set and register RDTs with metadata overhead reduced to the number of replicas. We also demonstrate metadata lower bounds for six RDTs, thereby proving optimality of four implementations. Our second contribution is the design of SwiftCloud, a replicated causally-consistent RDT object database for client-side applications. We devise algorithms to support high numbers of client-side partial replicas backed by the cloud, in a fault-tolerant manner, with small metadata. We demonstrate how to support availability and consistency, at the expense of some slight data staleness; i.e., our approach trades freshness for scalability (small metadata, parallelism), and availability (ability to fail-over between data centers). We validate our approach with experiments involving thousands of client replicas.Les bases de données répliquées cohérentes à terme récentes encapsulent la complexité de la concurrence et des pannes par le biais d'une interface supportant la cohérence causale, protégeant l'application des problèmes d'ordre, et/ou des Types de Données Répliqués (RDTs), assurant une sémantique convergente des mises-à-jour concurrentes en utilisant une interface objet. Cependant, les algorithmes fiables pour les RDTs et la cohérence causale ont un coût en terme de taille des métadonnées. Cette thèse étudie la conception de tels algorithmes avec une taille de métadonnées minimisée et leurs limites. Notre première contribution est une étude de la complexité des métadonnées des RDTs. Les nombreuses implémentations existantes impliquent un important surcoût en espace de stockage. Nous concevons un ensemble optimisé et un registre RDTs avec un surcoût des métadonnées réduit au nombre de répliques. Nous démontrons également les bornes inférieures de la taille des métadonnées pour six RDTs, prouvant ainsi l'optimalité de quatre implémentations. Notre seconde contribution est le design de SwiftCloud, une base de données répliquée causalement cohérente d'objets RDTs pour les applications côté client. Nous concevons des algorithmes qui supportent un grand nombre de répliques partielles côté client, s'appuyant sur le cloud, tout en étant tolérant aux fautes et avec une faible taille de métadonnées. Nous démontrons comment supporter la disponibilité (y compris la capacité à basculer entre des centre de données lors d'une erreur), la cohérence et le passage à l'échelle (petite taille de métadonnées, parallélisme) au détriment d'un léger retard dans l'actualisation des données

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    Get PDF
    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC
    corecore