14 research outputs found
Non-Uniform Replication
Replication is a key technique in the design of efficient and reliable distributed systems. As information grows, it becomes difficult or even impossible to store all information at every replica. A common approach to deal with this problem is to rely on partial replication, where each replica maintains only a part of the total system information. As a consequence, a remote replica might need to be contacted for computing the reply to some given query, which leads to high latency costs particularly in geo-replicated settings. In this work, we introduce the concept of non- uniform replication, where each replica stores only part of the information, but where all replicas store enough information to answer every query. We apply this concept to eventual consistency and conflict-free replicated data types. We show that this model can address useful problems and present two data types that solve such problems. Our evaluation shows that non-uniform replication is more efficient than traditional replication, using less storage space and network bandwidth
Robustness Against Transactional Causal Consistency
Distributed storage systems and databases are widely used by various types of applications. Transactional access to these storage systems is an important abstraction allowing application programmers to consider blocks of actions (i.e., transactions) as executing atomically. For performance reasons, the consistency models implemented by modern databases are weaker than the standard serializability model, which corresponds to the atomicity abstraction of transactions executing over a sequentially consistent memory. Causal consistency for instance is one such model that is widely used in practice.
In this paper, we investigate application-specific relationships between several variations of causal consistency and we address the issue of verifying automatically if a given transactional program is robust against causal consistency, i.e., all its behaviors when executed over an arbitrary causally consistent database are serializable. We show that programs without write-write races have the same set of behaviors under all these variations, and we show that checking robustness is polynomial time reducible to a state reachability problem in transactional programs over a sequentially consistent shared memory. A surprising corollary of the latter result is that causal consistency variations which admit incomparable sets of behaviors admit comparable sets of robust programs. This reduction also opens the door to leveraging existing methods and tools for the verification of concurrent programs (assuming sequential consistency) for reasoning about programs running over causally consistent databases. Furthermore, it allows to establish that the problem of checking robustness is decidable when the programs executed at different sites are finite-state
Convergent types for shared memory
Dissertação de mestrado em Computer ScienceIt is well-known that consistency in shared memory concurrent programming comes with
the price of degrading performance and scalability. Some of the existing solutions to this
problem end up with high-level complexity and are not programmer friendly.
We present a simple and well-defined approach to obtain relevant results for shared memory
environments through relaxing synchronization. For that, we will look into Mergeable
Data Types, data structures analogous to Conflict-Free Replicated Data Types but designed to
perform in shared memory.
CRDTs were the first formal approach engaging a solid theoretical study about eventual
consistency on distributed systems, answering the CAP Theorem problem and providing
high-availability. With CRDTs, updates are unsynchronized, and replicas eventually converge
to a correct common state. However, CRDTs are not designed to perform in shared
memory. In large-scale distributed systems the merge cost is negligible when compared to
network mediated synchronization. Therefore, we have migrated the concept by developing
the already existent Mergeable Data Types through formally defining a programming
model that we named Global-Local View. Furthermore, we have created a portfolio of MDTs
and demonstrated that in the appropriated scenarios we can largely benefit from the model.É bem sabido que para garantir coerência em programas concorrentes num ambiente de
memória partilhada sacrifica-se performance e escalabilidade. Alguns dos métodos existentes
para garantirem resultados significativos introduzem uma elevada complexidade e
não são práticos.
O nosso objetivo é o de garantir uma abordagem simples e bem definida de alcançar
resultados notáveis em ambientes de memória partilhada, quando comparados com os
métodos existentes, relaxando a coerência. Para tal, vamos analisar o conceito de Mergeable
Data Type, estruturas análogas aos Conflict-Free Replicated Data Types mas concebidas para
memória partilhada.
CRDTs foram a primeira abordagem a desenvolver um estudo formal sobre eventual consistency,
respondendo ao problema descrito no CAP Theorem e garantindo elevada disponibilidade.
Com CRDTs os updates não são síncronos e as réplicas convergem eventualmente
para um estado correto e comum. No entanto, não foram concebidos para atuar
em memória partilhada. Em sistemas distribuídos de larga escala o custo da operação
de merge é negligenciável quando comparado com a sincronização global. Portanto, migramos
o conceito desenvolvendo os já existentes Mergeable Data Type através da criação
de uma formalização de um modelo de programação ao qual chamamos de Global-Local
View. Além do mais, criamos um portfolio de MDTs e demonstramos que nos cenários
apropriados podemos beneficiar largamente do modelo
VeriFx: Correct Replicated Data Types for the Masses
Distributed systems adopt weak consistency to ensure high availability and low latency, but state convergence is hard to guarantee due to conflicts. Experts carefully design replicated data types (RDTs) that resemble sequential data types and embed conflict resolution mechanisms that ensure convergence. Designing RDTs is challenging as their correctness depends on subtleties such as the ordering of concurrent operations. Currently, researchers manually verify RDTs, either by paper proofs or using proof assistants. Unfortunately, paper proofs are subject to reasoning flaws and mechanized proofs verify a formalization instead of a real-world implementation. Furthermore, writing mechanized proofs is reserved for verification experts and is extremely time-consuming. To simplify the design, implementation, and verification of RDTs, we propose VeriFx, a specialized programming language for RDTs with automated proof capabilities. VeriFx lets programmers implement RDTs atop functional collections and express correctness properties that are verified automatically. Verified RDTs can be transpiled to mainstream languages (currently Scala and JavaScript). VeriFx provides libraries for implementing and verifying Conflict-free Replicated Data Types (CRDTs) and Operational Transformation (OT) functions. These libraries implement the general execution model of those approaches and define their correctness properties. We use the libraries to implement and verify an extensive portfolio of 51 CRDTs, 16 of which are used in industrial databases, and reproduce a study on the correctness of OT functions
Conflict-Free Replicated Data Types in Dynamic Environments
Over the years, mobile devices have become increasingly popular and gained improved
computation capabilities allowing them to perform more complex tasks such as
collaborative applications. Given the weak characteristic properties of mobile networks,
which represent highly dynamic environments where users may experience regular involuntary
disconnection periods, the big question arises of how to maintain data consistency.
This issue is most pronounced in collaborative environments where multiple users interact
with each other, sharing a replicated state that may diverge due to concurrency
conflicts and loss of updates.
To maintain consistency, one of today’s best solutions is Conflict-Free Replicated Data
Types (CRDTs), which ensure low latency values and automatic conflict resolution, guaranteeing
eventual consistency of the shared data. However, a limitation often found on
CRDTs and the systems that employ them is the need for the knowledge of the replicas
whom the state changes must be disseminated to. This constitutes a problem since it is
inconceivable to maintain said knowledge in an environment where clients may leave
and join at any given time and consequently get disconnected due to mobile network
communications unreliability.
In this thesis, we present the study and extension of the CRDT concept to dynamic
environments by introducing the developed P/S-CRDTs model, where CRDTs are coupled
with the publisher/subscriber interaction scheme and additional mechanisms to
ensure users are able to cooperate and maintain consistency whilst accounting for the
consequent volatile behaviors of mobile networks. The experimental results show that
in volatile scenarios of disconnection, mobile users in collaborative activity maintain
consistency among themselves and when compared to other available CRDT models, the
P/S-CRDTs model is able to decouple the required knowledge of whom the updates must
be disseminated to, while ensuring appropriate network traffic values
Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC
A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC
Replication-Aware Linearizability
Geo-distributed systems often replicate data at multiple locations to achieve
availability and performance despite network partitions. These systems must
accept updates at any replica and propagate these updates asynchronously to
every other replica. Conflict-Free Replicated Data Types (CRDTs) provide a
principled approach to the problem of ensuring that replicas are eventually
consistent despite the asynchronous delivery of updates.
We address the problem of specifying and verifying CRDTs, introducing a new
correctness criterion called Replication-Aware Linearizability. This criterion
is inspired by linearizability, the de-facto correctness criterion for
(shared-memory) concurrent data structures. We argue that this criterion is
both simple to understand, and it fits most known implementations of CRDTs. We
provide a proof methodology to show that a CRDT satisfies replication-aware
linearizability which we apply on a wide range of implementations. Finally, we
show that our criterion can be leveraged to reason modularly about the
composition of CRDTs
Verifying and Enforcing Application Constraints in Antidote SQL
Geo-replicated storage systems are currently a fundamental piece in the development of large-scale
applications where users are distributed across the world. To meet the high requirements regarding la-
tency and availability of these applications, these database systems are forced to use weak consistency
mechanisms. However, under these consistency models, there is no guarantee that the invariants are
preserved, which can jeopardise the correctness of applications. The most obvious alternative to solve
this problem would be to use strong consistency, but this would place a large burden on the system.
Since neither of these options was feasible, many systems have been developed to preserve the
invariants of the applications without sacrificing low latency and high availability. These systems,
based on the analysis of operations, make it possible to increase the guarantees of weak consistency
by introducing consistency at the level of operations that are potentially dangerous to the invariant.
Antidote SQL is a database system that, by combining strong with weak consistency mechanisms,
attempts to guarantee the preservation of invariants at the data level. In this way, and after defining
the concurrency semantics for the application, any operation can be performed without coordination
and without the risk of violating the invariant. However, this approach has some limitations, namely
the fact that it is not trivial for developers to define appropriate concurrency semantics.
In this document, we propose a methodology for the verification and validation of defined prop-
erties, such as invariants, for applications using Antidote SQL. The proposed methodology uses a
high-level programming language with automatic verification features called VeriFx and provides
guidelines for programmers who wish to implement and verify their own systems and specifications
using this tool.Os sistemas de armazenamento geo-replicados são atualmente uma peça fundamental no desenvolvi-
mento de aplicações de grande escala em que os utilizadores se encontram espalhados pelo mundo.
Com o objetivo de satisfazer os elevados requisitos em relação à latência e à disponibilidade destas
aplicações, estes sistemas de bases de dados vêem-se obrigados a recorrer a mecanismos de consistên-
cia fracos. No entanto, sob estes modelos de consistência não existe qualquer tipo de garantia de que os
invariantes são preservados, o que pode colocar em causa a correção das aplicações. A alternativa mais
óbvia para resolver este problema passaria por utilizar consistência forte, no entanto esta incutiria
uma grande sobrecarga no sistema.
Sendo que nenhuma destas opções é viável, muitos sistemas foram desenvolvidos no sentido de
preservar os invariantes das aplicações, sem contudo, abdicar de baixas latências e alta disponibilidade.
Estes sistemas, baseados na análise das operações, permitem aumentar as garantias de consistência
fraca com a introdução de consistência ao nível das operações potencialmente perigosas para o invari-
ante.
O Antidote SQL é um sistema de base de dados que através da combinação de mecanismos de
consistência fortes com mecanismos de consistência fracos tenta garantir a preservação dos invariantes
ao nível dos dados. Desta forma, e depois de definidas as semânticas de concorrência para a aplicação,
qualquer operação pode ser executada sem coordenação e sem perigo de quebra do invariante. No
entanto esta abordagem apresenta algumas limitações nomeadamente o facto de não ser trivial para
os programadores definirem as semânticas de concorrência adequadas.
Neste documento propomos uma metodologia para a verificação e validação de propriedades defi-
nidas, como os invariantes, para aplicações que usam o Antidote SQL. A metodologia proposta utiliza
uma linguagem de programação de alto nível com capacidade de verificação automática designada
por VeriFx, e fornece as diretrizes a seguir para que o programador consiga implementar e verificar
os seus próprios sistemas e especificações, utilizando a ferramenta
Programming Languages and Systems
This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems