85 research outputs found
CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing
storage services, but fall short of guaranteeing linearizable consistency
in partially synchronous, lossy, partitionable, and dynamic networks, when data
is distributed and replicated automatically by the principle of consistent hashing.
This paper introduces consistent quorums as a solution for achieving atomic
consistency. We present the design and implementation of CATS, a distributed
key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is
scalable, elastic, and self-organizing; key properties for modern cloud storage
middleware. Our system shows that consistency can be achieved with practical
performance and modest throughput overhead (5%) for read-intensive workloads
Dynamic Reconfiguration: Abstraction and Optimal Asynchronous Solution
Providing clean and efficient foundations and tools for reconfiguration is a crucial enabler for distributed system management today. This work takes a step towards developing such foundations. It considers classic fault-tolerant atomic objects emulated on top of a static set of fault-prone servers, and turns them into dynamic ones. The specification of a dynamic object extends the corresponding static (non-dynamic) one with an API for changing the underlying set of fault-prone servers. Thus, in a dynamic model, an object can start in some configuration and continue in a different one. Its liveness is preserved through the reconfigurations it undergoes, tolerating a versatile set of faults as it shifts from one configuration to another.
In this paper we present a general abstraction for asynchronous reconfiguration, and exemplify its usefulness for building two dynamic objects: a read/write register and a max-register. We first define a dynamic model with a clean failure condition that allows an administrator to reconfigure the system and switch off a server once the reconfiguration operation removing it completes. We then define the Reconfiguration abstraction and show how it can be used to build dynamic registers and max-registers. Finally, we give an optimal asynchronous algorithm implementing the Reconfiguration abstraction, which in turn leads to the first asynchronous (consensus-free) dynamic register emulation with optimal complexity. More concretely, faced with n requests for configuration changes, the number of configurations that the dynamic register is implemented over is n; and the complexity of each client operation is O(n)
Clouder : a flexible large scale decentralized object store
Programa Doutoral em Informática MAP-iLarge scale data stores have been initially introduced to support a few concrete extreme
scale applications such as social networks. Their scalability and availability
requirements often outweigh sacrificing richer data and processing models, and even
elementary data consistency. In strong contrast with traditional relational databases
(RDBMS), large scale data stores present very simple data models and APIs, lacking
most of the established relational data management operations; and relax consistency
guarantees, providing eventual consistency.
With a number of alternatives now available and mature, there is an increasing
willingness to use them in a wider and more diverse spectrum of applications, by
skewing the current trade-off towards the needs of common business users, and easing
the migration from current RDBMS. This is particularly so when used in the context
of a Cloud solution such as in a Platform as a Service (PaaS).
This thesis aims at reducing the gap between traditional RDBMS and large scale
data stores, by seeking mechanisms to provide additional consistency guarantees and
higher level data processing primitives in large scale data stores. The devised mechanisms
should not hinder the scalability and dependability of large scale data stores.
Regarding, higher level data processing primitives this thesis explores two complementary
approaches: by extending data stores with additional operations such as general
multi-item operations; and by coupling data stores with other existent processing
facilities without hindering scalability.
We address this challenges with a new architecture for large scale data stores, efficient
multi item access for large scale data stores, and SQL processing atop large scale
data stores. The novel architecture allows to find the right trade-offs among flexible
usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data
placement strategy, that allow to efficiently store and retrieve large sets of related data
at once. For efficient SQL support atop scalable data stores we devise design modifications
to existing relational SQL query engines, allowing them to be distributed.
We demonstrate our approaches with running prototypes and extensive experimental
evaluation using proper workloads.Os sistemas de armazenamento de dados de grande escala foram inicialmente desenvolvidos
para suportar um leque restrito de aplicacões de escala extrema, como as
redes sociais. Os requisitos de escalabilidade e elevada disponibilidade levaram a
sacrificar modelos de dados e processamento enriquecidos e até a coerência dos dados.
Em oposição aos tradicionais sistemas relacionais de gestão de bases de dados
(SRGBD), os sistemas de armazenamento de dados de grande escala apresentam modelos
de dados e APIs muito simples. Em particular, evidenciasse a ausência de muitas
das conhecidas operacões de gestão de dados relacionais e o relaxamento das garantias
de coerência, fornecendo coerência futura.
Atualmente, com o número de alternativas disponíveis e maduras, existe o crescente
interesse em usá-los num maior e diverso leque de aplicacões, orientando o atual
compromisso para as necessidades dos típicos clientes empresariais e facilitando a
migração a partir das atuais SRGBD. Isto é particularmente importante no contexto de
soluções cloud como plataformas como um servic¸o (PaaS).
Esta tese tem como objetivo reduzir a diferencça entre os tradicionais SRGDBs e os
sistemas de armazenamento de dados de grande escala, procurando mecanismos que
providenciem garantias de coerência mais fortes e primitivas com maior capacidade de
processamento. Os mecanismos desenvolvidos não devem comprometer a escalabilidade
e fiabilidade dos sistemas de armazenamento de dados de grande escala. No que
diz respeito às primitivas com maior capacidade de processamento esta tese explora
duas abordagens complementares : a extensão de sistemas de armazenamento de dados
de grande escala com operacões genéricas de multi objeto e a junção dos sistemas de armazenamento de dados de grande escala com mecanismos existentes de processamento
e interrogac¸ ˜ao de dados, sem colocar em causa a escalabilidade dos mesmos.
Para isso apresent´amos uma nova arquitetura para os sistemas de armazenamento
de dados de grande escala, acesso eficiente a m´ultiplos objetos, e processamento de
SQL sobre sistemas de armazenamento de dados de grande escala. A nova arquitetura
permite encontrar os compromissos adequados entre flexibilidade, eficiˆencia e
tolerˆancia a faltas. De forma a suportar de forma eficiente o acesso a m´ultiplos objetos
estendemos o modelo de dados de sistemas de armazenamento de dados de grande escala
da primeira gerac¸ ˜ao com palavras-chave e definimos uma estrat´egia de colocac¸ ˜ao
de dados para m´ultiplos objetos que permite de forma eficiente armazenar e obter
grandes quantidades de dados de uma s´o vez. Para o suporte eficiente de SQL sobre
sistemas de armazenamento de dados de grande escala, analisámos a arquitetura dos
motores de interrogação de SRGBDs e fizemos alterações que permitem que sejam
distribuídos.
As abordagens propostas são demonstradas através de protótipos e uma avaliacão
experimental exaustiva recorrendo a cargas adequadas baseadas em aplicações reais
Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12
This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc
Cloud-edge hybrid applications
Many modern applications are designed to provide interactions among users, including multi-
user games, social networks and collaborative tools. Users expect application response time to
be in the order of milliseconds, to foster interaction and interactivity.
The design of these applications typically adopts a client-server model, where all interac-
tions are mediated by a centralized component. This approach introduces availability and fault-
tolerance issues, which can be mitigated by replicating the server component, and even relying on
geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server
communication model leads to unnecessary latency penalties for geographically close clients and
high operational costs for the application provider.
This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation
and consistency mechanisms. This model combines client-side replication and client-to-client
propagation for providing low latency and minimizing the dependency on the server infras-
tructure, fostering availability and fault tolerance. To realize this model, this works makes the
following key contributions.
First, the cloud-edge hybrid model is materialized by a system design where clients maintain
replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist
clients’ operation. We study how to bring most of the application logic to the client-side, us-
ing the centralized service primarily for durability, access control, discovery, and overcoming
internetwork limitations.
Second, we dene protocols for weakly consistent data replication, including a novel CRDT
model (∆-CRDTs). We provide a study on partial replication, exploring the challenges and
fundamental limitations in providing causal consistency, and the diculty in supporting client-
side replicas due to their ephemeral nature.
Third, we study how client misbehaviour can impact the guarantees of causal consistency.
We propose new secure weak consistency models for insecure settings, and algorithms to enforce
such consistency models.
The experimental evaluation of our contributions have shown their specic benets and
limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to
faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected
from the server, and reduced server bandwidth usage.
In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la-
tency, availability under server disconnections, and improved server scalability – while being
ecient, reliable, and secure.Muitas aplicações modernas são criadas para fornecer interações entre utilizadores, incluindo
jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o
tempo de resposta nas aplicações seja da ordem de milissegundos, promovendo a interação e
interatividade.
A arquitetura dessas aplicações normalmente adota um modelo cliente-servidor, onde todas as
interações são mediadas por um componente centralizado. Essa abordagem apresenta problemas
de disponibilidade e tolerância a falhas, que podem ser mitigadas com replicação no componente
do servidor, até com a utilização de soluções replicadas geogracamente em infraestruturas de
computação na nuvem. Mesmo neste caso, o modelo de comunicação cliente-servidor leva a
penalidades de latência desnecessárias para clientes geogracamente próximos e altos custos
operacionais para o provedor das aplicações.
Esta dissertação propõe um modelo híbrido cloud-edge com mecanismos seguros e ecientes
de propagação e consistência. Esse modelo combina replicação do lado do cliente e propagação
de cliente para cliente para fornecer baixa latência e minimizar a dependência na infraestrutura
do servidor, promovendo a disponibilidade e tolerância a falhas. Para realizar este modelo, este
trabalho faz as seguintes contribuições principais.
Primeiro, o modelo híbrido cloud-edge é materializado por uma arquitetura do sistema em
que os clientes mantêm réplicas dos dados e sincronizam de maneira ponto a ponto e onde os
servidores são usados para auxiliar na operação dos clientes. Estudamos como trazer a maior
parte da lógica das aplicações para o lado do cliente, usando o serviço centralizado principalmente
para durabilidade, controlo de acesso, descoberta e superação das limitações inter-rede.
Em segundo lugar, denimos protocolos para replicação de dados fracamente consistentes,
incluindo um novo modelo de CRDTs (∆-CRDTs). Fornecemos um estudo sobre replicação parcial,
explorando os desaos e limitações fundamentais em fornecer consistência causal e a diculdade
em suportar réplicas do lado do cliente devido à sua natureza efémera.
Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias
da consistência causal. Propomos novos modelos seguros de consistência fraca para congurações
inseguras e algoritmos para impor tais modelos de consistência.
A avaliação experimental das nossas contribuições mostrou os benefícios e limitações em comparação com o estado da arte. Em geral, o modelo híbrido cloud-edge leva a tempos de resposta
nas aplicações mais rápidos, a uma menor latência de cliente para cliente e à possibilidade de
trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade
do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo
e devido à redução na utilização da largura de banda no servidor.
Em resumo, propomos um modelo híbrido entre a orla (edge) e a nuvem (cloud) que fornece
menor latência entre utilizadores, disponibilidade durante desconexões do servidor e uma melhor
escalabilidade do servidor – ao mesmo tempo que é eciente, conável e seguro
DECOUPLING CONSISTENCY DETERMINATION AND TRUST FROM THE UNDERLYING DISTRIBUTED DATA STORES
Building applications on cloud services is cost-effective and allows for rapid development and release cycles. However, relying on cloud services can severely limit applications’ ability to control their own consistency policies, and their ability to control data visibility during replication.
To understand the tension between strong consistency and security guarantees on one hand and high availability, flexible replication, and performance on the other, it helps to consider two questions. First, is it possible for an application to achieve stricter consistency guarantees than what the cloud provider offers? If we solely rely on the provider service interface, the answer is no. However, if we allow the applications to determine the implementation and the execution of the consistency protocols, then we can achieve much more.
The second question is, can an application relay updates over untrusted replicas without revealing sensitive information while maintaining the desired consistency guarantees? Simply encrypting the data is not enough. Encryption does not eliminate information leakage that comes from the meta-data needed for the execution of any consistency protocol. The alternative to encryption—allowing the flow of updates only through trusted replicas— leads to predefined communication patterns. This approach is prone to failures that can cause partitioning in the system. One way to answer “yes” to this question is to allow trust relationships, defined at the application level, to guide the synchronization protocol.
My goal in this thesis is to build systems that take advantage of the performance, scalability, and availability of the cloud storage services while, at the same time, bypassing the limitations imposed by cloud service providers’ design choices. The key to achieving this is pushing application-specific decisions where they belong: the application.
I defend the following thesis statement: By decoupling consistency determination and trust from the underlying distributed data store, it is possible to (1) support application-specific consistency guarantees; (2) allow for topology independent replication protocols that do not compromise application privacy.
First I design and implement Shell, a system architecture for supporting strict consistency guarantees over eventually consistent data stores. Shell is a software layer designed to isolate consistency implementations and cloud-provider APIs from the application code. Shell consists of four internal modules and an application store, which together abstract consistency-related operations and encapsulate communication with the underlying storage layers. Apart from consistency protocols tailored to application needs, Shell provides application-aware conflict resolution without relying on generic heuristics such as the “last write wins.” Shell does not require the application to maintain dependency-tracking in- formation for the execution of the consistency protocols as other existing approaches do. I experimentally evaluate Shell over two different data-stores using real-application traces. I found that using Shell can reduce the inconsistent updates by 10%. I also measure and show the overheads that come from introducing the Shell layer.
Second, I design and implement T.Rex, a system for supporting topology-independent replication without the assumption of trust between all the participating replicas. T.Rex uses
role-based access control to enable flexible and secure sharing among users with widely varying collaboration types: both users and data items are assigned roles, and a user can access data only if it shares at least one role. Building on top of this abstraction, T.Rex includes several novel mechanisms: I introduce role proofs to prove role membership to others in the role without leaking information to those not in the role. Additionally, I introduce role coherence to prevent updates from leaking across roles. Finally, I use Bloom filters as opaque digests to enable querying of remote cache state without being able to enumerate it. I combine these mechanisms to develop a novel, cryptographically secure, and efficient anti-entropy protocol, T.Rex-Sync. I evaluate T.Rex on a local test-bed, and I show that it achieves security with modest computational and storage overheads
- …