85 research outputs found

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    Dynamic Reconfiguration: Abstraction and Optimal Asynchronous Solution

    Get PDF
    Providing clean and efficient foundations and tools for reconfiguration is a crucial enabler for distributed system management today. This work takes a step towards developing such foundations. It considers classic fault-tolerant atomic objects emulated on top of a static set of fault-prone servers, and turns them into dynamic ones. The specification of a dynamic object extends the corresponding static (non-dynamic) one with an API for changing the underlying set of fault-prone servers. Thus, in a dynamic model, an object can start in some configuration and continue in a different one. Its liveness is preserved through the reconfigurations it undergoes, tolerating a versatile set of faults as it shifts from one configuration to another. In this paper we present a general abstraction for asynchronous reconfiguration, and exemplify its usefulness for building two dynamic objects: a read/write register and a max-register. We first define a dynamic model with a clean failure condition that allows an administrator to reconfigure the system and switch off a server once the reconfiguration operation removing it completes. We then define the Reconfiguration abstraction and show how it can be used to build dynamic registers and max-registers. Finally, we give an optimal asynchronous algorithm implementing the Reconfiguration abstraction, which in turn leads to the first asynchronous (consensus-free) dynamic register emulation with optimal complexity. More concretely, faced with n requests for configuration changes, the number of configurations that the dynamic register is implemented over is n; and the complexity of each client operation is O(n)

    Clouder : a flexible large scale decentralized object store

    Get PDF
    Programa Doutoral em Informática MAP-iLarge scale data stores have been initially introduced to support a few concrete extreme scale applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. In strong contrast with traditional relational databases (RDBMS), large scale data stores present very simple data models and APIs, lacking most of the established relational data management operations; and relax consistency guarantees, providing eventual consistency. With a number of alternatives now available and mature, there is an increasing willingness to use them in a wider and more diverse spectrum of applications, by skewing the current trade-off towards the needs of common business users, and easing the migration from current RDBMS. This is particularly so when used in the context of a Cloud solution such as in a Platform as a Service (PaaS). This thesis aims at reducing the gap between traditional RDBMS and large scale data stores, by seeking mechanisms to provide additional consistency guarantees and higher level data processing primitives in large scale data stores. The devised mechanisms should not hinder the scalability and dependability of large scale data stores. Regarding, higher level data processing primitives this thesis explores two complementary approaches: by extending data stores with additional operations such as general multi-item operations; and by coupling data stores with other existent processing facilities without hindering scalability. We address this challenges with a new architecture for large scale data stores, efficient multi item access for large scale data stores, and SQL processing atop large scale data stores. The novel architecture allows to find the right trade-offs among flexible usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data placement strategy, that allow to efficiently store and retrieve large sets of related data at once. For efficient SQL support atop scalable data stores we devise design modifications to existing relational SQL query engines, allowing them to be distributed. We demonstrate our approaches with running prototypes and extensive experimental evaluation using proper workloads.Os sistemas de armazenamento de dados de grande escala foram inicialmente desenvolvidos para suportar um leque restrito de aplicacões de escala extrema, como as redes sociais. Os requisitos de escalabilidade e elevada disponibilidade levaram a sacrificar modelos de dados e processamento enriquecidos e até a coerência dos dados. Em oposição aos tradicionais sistemas relacionais de gestão de bases de dados (SRGBD), os sistemas de armazenamento de dados de grande escala apresentam modelos de dados e APIs muito simples. Em particular, evidenciasse a ausência de muitas das conhecidas operacões de gestão de dados relacionais e o relaxamento das garantias de coerência, fornecendo coerência futura. Atualmente, com o número de alternativas disponíveis e maduras, existe o crescente interesse em usá-los num maior e diverso leque de aplicacões, orientando o atual compromisso para as necessidades dos típicos clientes empresariais e facilitando a migração a partir das atuais SRGBD. Isto é particularmente importante no contexto de soluções cloud como plataformas como um servic¸o (PaaS). Esta tese tem como objetivo reduzir a diferencça entre os tradicionais SRGDBs e os sistemas de armazenamento de dados de grande escala, procurando mecanismos que providenciem garantias de coerência mais fortes e primitivas com maior capacidade de processamento. Os mecanismos desenvolvidos não devem comprometer a escalabilidade e fiabilidade dos sistemas de armazenamento de dados de grande escala. No que diz respeito às primitivas com maior capacidade de processamento esta tese explora duas abordagens complementares : a extensão de sistemas de armazenamento de dados de grande escala com operacões genéricas de multi objeto e a junção dos sistemas de armazenamento de dados de grande escala com mecanismos existentes de processamento e interrogac¸ ˜ao de dados, sem colocar em causa a escalabilidade dos mesmos. Para isso apresent´amos uma nova arquitetura para os sistemas de armazenamento de dados de grande escala, acesso eficiente a m´ultiplos objetos, e processamento de SQL sobre sistemas de armazenamento de dados de grande escala. A nova arquitetura permite encontrar os compromissos adequados entre flexibilidade, eficiˆencia e tolerˆancia a faltas. De forma a suportar de forma eficiente o acesso a m´ultiplos objetos estendemos o modelo de dados de sistemas de armazenamento de dados de grande escala da primeira gerac¸ ˜ao com palavras-chave e definimos uma estrat´egia de colocac¸ ˜ao de dados para m´ultiplos objetos que permite de forma eficiente armazenar e obter grandes quantidades de dados de uma s´o vez. Para o suporte eficiente de SQL sobre sistemas de armazenamento de dados de grande escala, analisámos a arquitetura dos motores de interrogação de SRGBDs e fizemos alterações que permitem que sejam distribuídos. As abordagens propostas são demonstradas através de protótipos e uma avaliacão experimental exaustiva recorrendo a cargas adequadas baseadas em aplicações reais

    Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12

    Get PDF
    This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc

    Cloud-edge hybrid applications

    Get PDF
    Many modern applications are designed to provide interactions among users, including multi- user games, social networks and collaborative tools. Users expect application response time to be in the order of milliseconds, to foster interaction and interactivity. The design of these applications typically adopts a client-server model, where all interac- tions are mediated by a centralized component. This approach introduces availability and fault- tolerance issues, which can be mitigated by replicating the server component, and even relying on geo-replicated solutions in cloud computing infrastructures. Even in this case, the client-server communication model leads to unnecessary latency penalties for geographically close clients and high operational costs for the application provider. This dissertation proposes a cloud-edge hybrid model with secure and ecient propagation and consistency mechanisms. This model combines client-side replication and client-to-client propagation for providing low latency and minimizing the dependency on the server infras- tructure, fostering availability and fault tolerance. To realize this model, this works makes the following key contributions. First, the cloud-edge hybrid model is materialized by a system design where clients maintain replicas of the data and synchronize in a peer-to-peer fashion, and servers are used to assist clients’ operation. We study how to bring most of the application logic to the client-side, us- ing the centralized service primarily for durability, access control, discovery, and overcoming internetwork limitations. Second, we dene protocols for weakly consistent data replication, including a novel CRDT model (∆-CRDTs). We provide a study on partial replication, exploring the challenges and fundamental limitations in providing causal consistency, and the diculty in supporting client- side replicas due to their ephemeral nature. Third, we study how client misbehaviour can impact the guarantees of causal consistency. We propose new secure weak consistency models for insecure settings, and algorithms to enforce such consistency models. The experimental evaluation of our contributions have shown their specic benets and limitations compared with the state-of-the-art. In general, the cloud-edge hybrid model leads to faster application response times, lower client-to-client latency, higher system scalability as fewer clients need to connect to servers at the same time, the possibility to work oine or disconnected from the server, and reduced server bandwidth usage. In summary, we propose a hybrid of cloud-and-edge which provides lower user-to-user la- tency, availability under server disconnections, and improved server scalability – while being ecient, reliable, and secure.Muitas aplicações modernas são criadas para fornecer interações entre utilizadores, incluindo jogos multiutilizador, redes sociais e ferramentas colaborativas. Os utilizadores esperam que o tempo de resposta nas aplicações seja da ordem de milissegundos, promovendo a interação e interatividade. A arquitetura dessas aplicações normalmente adota um modelo cliente-servidor, onde todas as interações são mediadas por um componente centralizado. Essa abordagem apresenta problemas de disponibilidade e tolerância a falhas, que podem ser mitigadas com replicação no componente do servidor, até com a utilização de soluções replicadas geogracamente em infraestruturas de computação na nuvem. Mesmo neste caso, o modelo de comunicação cliente-servidor leva a penalidades de latência desnecessárias para clientes geogracamente próximos e altos custos operacionais para o provedor das aplicações. Esta dissertação propõe um modelo híbrido cloud-edge com mecanismos seguros e ecientes de propagação e consistência. Esse modelo combina replicação do lado do cliente e propagação de cliente para cliente para fornecer baixa latência e minimizar a dependência na infraestrutura do servidor, promovendo a disponibilidade e tolerância a falhas. Para realizar este modelo, este trabalho faz as seguintes contribuições principais. Primeiro, o modelo híbrido cloud-edge é materializado por uma arquitetura do sistema em que os clientes mantêm réplicas dos dados e sincronizam de maneira ponto a ponto e onde os servidores são usados para auxiliar na operação dos clientes. Estudamos como trazer a maior parte da lógica das aplicações para o lado do cliente, usando o serviço centralizado principalmente para durabilidade, controlo de acesso, descoberta e superação das limitações inter-rede. Em segundo lugar, denimos protocolos para replicação de dados fracamente consistentes, incluindo um novo modelo de CRDTs (∆-CRDTs). Fornecemos um estudo sobre replicação parcial, explorando os desaos e limitações fundamentais em fornecer consistência causal e a diculdade em suportar réplicas do lado do cliente devido à sua natureza efémera. Terceiro, estudamos como o mau comportamento da parte do cliente pode afetar as garantias da consistência causal. Propomos novos modelos seguros de consistência fraca para congurações inseguras e algoritmos para impor tais modelos de consistência. A avaliação experimental das nossas contribuições mostrou os benefícios e limitações em comparação com o estado da arte. Em geral, o modelo híbrido cloud-edge leva a tempos de resposta nas aplicações mais rápidos, a uma menor latência de cliente para cliente e à possibilidade de trabalhar oine ou desconectado do servidor. Adicionalmente, obtemos uma maior escalabilidade do sistema, visto que menos clientes precisam de estar conectados aos servidores ao mesmo tempo e devido à redução na utilização da largura de banda no servidor. Em resumo, propomos um modelo híbrido entre a orla (edge) e a nuvem (cloud) que fornece menor latência entre utilizadores, disponibilidade durante desconexões do servidor e uma melhor escalabilidade do servidor – ao mesmo tempo que é eciente, conável e seguro

    DECOUPLING CONSISTENCY DETERMINATION AND TRUST FROM THE UNDERLYING DISTRIBUTED DATA STORES

    Get PDF
    Building applications on cloud services is cost-effective and allows for rapid development and release cycles. However, relying on cloud services can severely limit applications’ ability to control their own consistency policies, and their ability to control data visibility during replication. To understand the tension between strong consistency and security guarantees on one hand and high availability, flexible replication, and performance on the other, it helps to consider two questions. First, is it possible for an application to achieve stricter consistency guarantees than what the cloud provider offers? If we solely rely on the provider service interface, the answer is no. However, if we allow the applications to determine the implementation and the execution of the consistency protocols, then we can achieve much more. The second question is, can an application relay updates over untrusted replicas without revealing sensitive information while maintaining the desired consistency guarantees? Simply encrypting the data is not enough. Encryption does not eliminate information leakage that comes from the meta-data needed for the execution of any consistency protocol. The alternative to encryption—allowing the flow of updates only through trusted replicas— leads to predefined communication patterns. This approach is prone to failures that can cause partitioning in the system. One way to answer “yes” to this question is to allow trust relationships, defined at the application level, to guide the synchronization protocol. My goal in this thesis is to build systems that take advantage of the performance, scalability, and availability of the cloud storage services while, at the same time, bypassing the limitations imposed by cloud service providers’ design choices. The key to achieving this is pushing application-specific decisions where they belong: the application. I defend the following thesis statement: By decoupling consistency determination and trust from the underlying distributed data store, it is possible to (1) support application-specific consistency guarantees; (2) allow for topology independent replication protocols that do not compromise application privacy. First I design and implement Shell, a system architecture for supporting strict consistency guarantees over eventually consistent data stores. Shell is a software layer designed to isolate consistency implementations and cloud-provider APIs from the application code. Shell consists of four internal modules and an application store, which together abstract consistency-related operations and encapsulate communication with the underlying storage layers. Apart from consistency protocols tailored to application needs, Shell provides application-aware conflict resolution without relying on generic heuristics such as the “last write wins.” Shell does not require the application to maintain dependency-tracking in- formation for the execution of the consistency protocols as other existing approaches do. I experimentally evaluate Shell over two different data-stores using real-application traces. I found that using Shell can reduce the inconsistent updates by 10%. I also measure and show the overheads that come from introducing the Shell layer. Second, I design and implement T.Rex, a system for supporting topology-independent replication without the assumption of trust between all the participating replicas. T.Rex uses role-based access control to enable flexible and secure sharing among users with widely varying collaboration types: both users and data items are assigned roles, and a user can access data only if it shares at least one role. Building on top of this abstraction, T.Rex includes several novel mechanisms: I introduce role proofs to prove role membership to others in the role without leaking information to those not in the role. Additionally, I introduce role coherence to prevent updates from leaking across roles. Finally, I use Bloom filters as opaque digests to enable querying of remote cache state without being able to enumerate it. I combine these mechanisms to develop a novel, cryptographically secure, and efficient anti-entropy protocol, T.Rex-Sync. I evaluate T.Rex on a local test-bed, and I show that it achieves security with modest computational and storage overheads
    corecore