    DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores

    We present our framework DKVF that enables one to quickly prototype and evaluate new protocols for key-value stores and compare them with existing protocols based on selected benchmarks. Due to limitations of CAP theorem, new protocols must be developed that achieve the desired trade-off between consistency and availability for the given application at hand. Hence, both academic and industrial communities focus on developing new protocols that identify a different (and hopefully better in one or more aspect) point on this trade-off curve. While these protocols are often based on a simple intuition, evaluating them to ensure that they indeed provide increased availability, consistency, or performance is a tedious task. Our framework, DKVF, enables one to quickly prototype a new protocol as well as identify how it performs compared to existing protocols for pre-specified benchmarks. Our framework relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We demonstrate DKVF by implementing four existing protocols --eventual consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the performance of these protocols against different loading conditions. We find that the performance is similar to our implementation of these protocols from scratch. And, the comparison of these protocols is consistent with what has been reported in the literature. Moreover, implementation of these protocols was much more natural as we only needed to translate the pseudocode into Java (and add the necessary error handling). Hence, it was possible to achieve this in just 1-2 days per protocol. Finally, our framework is extensible. It is possible to replace individual components in the framework (e.g., the storage component)

    Tunable Causal Consistency: Specification and Implementation

    To achieve high availability and low latency, distributed data stores often geographically replicate data at multiple sites called replicas. However, this introduces the data consistency problem. Due to the fundamental tradeoffs among consistency, availability, and latency in the presence of network partition, no a one-size-fits-all consistency model exists. To meet the needs of different applications, many popular data stores provide tunable consistency, allowing clients to specify the consistency level per individual operation. In this paper, we propose tunable causal consistency (TCC). It allows clients to choose the desired session guarantee for each operation, from the well-known four session guarantees, i.e., read your writes, monotonic reads, monotonic writes, and writes follow reads. Specifically, we first propose a formal specification of TCC in an extended (vis,ar) framework originally proposed by Burckhardt et al. Then we design a TCC protocol and develop a prototype distributed key-value store called TCCSTORE. We evaluate TCCSTORE on Aliyun. The latency is less than 38ms for all workloads and the throughput is up to about 2800 operations per second. We also show that TCC achieves better performance than causal consistency and requires a negligible overhead when compared with eventual consistency

    High performance data processing

    Dissertação de mestrado em Informatics EngeneeringÀ medida que as aplicações atingem uma maior quantidade de utilizadores, precisam de processar uma crescente quantidade de pedidos. Para além disso, precisam de muitas vezes satisfazer pedidos de utilizadores de diferentes partes do globo, onde as latências de rede têm um impacto significativo no desempenho em instalações monolíticas. Portanto, distribuição é uma solução muito procurada para melhorar a performance das camadas aplicacional e de dados. Contudo, distribuir dados não é uma tarefa simples se pretendemos assegurar uma forte consistência. Isto leva a que muitos sistemas de base de dados dependam de protocolos de sincronização pesados, como two-phase commit, consenso distribuído, bloqueamento distribuído, entre outros, enquanto que outros sistemas dependem em consistência fraca, não viável para alguns casos de uso. Esta tese apresenta o design, implementação e avaliação de duas soluções que têm como objetivo reduzir o impacto de assegurar garantias de forte consistência em sistemas de base de dados, especialmente aqueles distribuídos pelo globo. A primeira é o Primary Semi-Primary, uma arquitetura de base de dados distribuída com total replicação que permite que as réplicas evoluam independentemente, para evitar que os clientes precisem de esperar que escritas precedentes que não geram conflitos sejam propagadas. Apesar das réplicas poderem processar tanto leituras como escritas, melhorando a escalabilidade, o sistema continua a oferecer garantias de consistência forte, através do envio da certificação de transações para um nó central. O seu design é independente de modelos de dados, mas a sua implementação pode tirar partido do controlo de concorrência nativo oferecido por algumas base de dados, como é mostrado na implementação usando PostgreSQL e o seu Snapshot Isolation. Os resultados apresentam várias vantagens tanto em ambientes locais como globais. A segunda solução são os Multi-Record Values, uma técnica que particiona dinâmicamente valores numéricos em múltiplos registros, permitindo que escritas concorrentes possam executar com uma baixa probabilidade de colisão, reduzindo a taxa de abortos e/ou contenção na adquirição de locks. Garantias de limites inferiores, exigido por objetos como saldos bancários ou inventários, são assegurados por esta estratégia, ao contrário de muitas outras alternativas. O seu design é também indiferente do modelo de dados, sendo que as suas vantagens podem ser encontradas em sistemas SQL e NoSQL, bem como distribuídos ou centralizados, tal como apresentado na secção de avaliação.As applications reach an wider audience that ever before, they must process larger and larger amounts of requests. In addition, they often must be able to serve users all over the globe, where network latencies have a significant negative impact on monolithic deployments. Therefore, distribution is a well sought-after solution to improve performance of both applicational and database layers. However, distributing data is not an easy task if we want to ensure strong consistency guarantees. This leads many databases systems to rely on expensive synchronization controls protocols such as two-phase commit, distributed consensus, distributed locking, among others, while other systems rely on weak consistency, unfeasible for some use cases. This thesis presents the design, implementation and evaluation of two solutions aimed at reducing the impact of ensuring strong consistency guarantees on database systems, especially geo-distributed ones. The first is the Primary Semi-Primary, a full replication distributed database architecture that allows different replicas to evolve independently, to avoid that clients wait for preceding non-conflicting updates. Al though replicas can process both reads and writes, improving scalability, the system still ensures strong consistency guarantees, by relaying transactions’ certifications to a central node. Its design is independent of the underlying data model, but its implementation can take advantage of the native concurrency control offered by some systems, as is exemplified by an implementation using PostgreSQL and its Snapshot Isolation. The results present several advantages in both throughput and response time, when comparing to other alternative architectures, in both local and geo-distributed environments. The second solution is the Multi-Record Values, a technique that dynami cally partitions numeric values into multiple records, allowing concurrent writes to execute with low conflict probability, reducing abort rate and/or locking contention. Lower limit guarantees, required by objects such as balances or stocks, are ensure by this strategy, unlike many other similar alternatives. Its design is also data model agnostic, given its advantages can be found in both SQL and NoSQL systems, as well as both centralized and distributed database, as presented in the evaluation section

    New Production System for Finnish Meteorological Institute

    This thesis presents the plans for replacing the production system of Finnish Meteorological Institute (FMI). It begins with a review of the state of the art in distributed systems research, and ends with a design for the replacement production system that is reliable, scalable, and maintainable. The subject production system is a framework for managing the production of different weather predictions and models. We use this framework to abstract away the actual execution of work from its description. This way the different production processes become easily monitored and configured through the production system. Since the amount of data processed by this system is too much for a single computer to handle, we have distributed the production system. Thus we are not dealing with just a framework for production but with a distributed system and hence a solid understanding of distributed systems theory is required in order to replace this production system. The first part of this thesis lays the groundwork for replacing the distributed production system: a review of the state of the art in distributed systems research. It is a concise document of its own which presents the essentials of distributed systems in a clear manner. This part can be used separately from the rest of this thesis as a short introduction to distributed systems. Second part of this thesis presents the subject production system, the need for its replacement, and our design for the new production system that is maintainable, performant, available, reliable, and scalable. We go even further than simply giving a design for this replacement production system, and instead present a practical plan to implement the new production system with Kubernetes, Brigade, and Riak CS

    Techniques intelligentes pour la gestion de la cohérence des Big data dans le cloud

    Cette thèse aborde le problème de cohérence des données de Bigdata dans le cloud. En effet, nos recherches portent sur l’étude de différentes approches de cohérence adaptative dans le cloud et la proposition d’une nouvelle approche pour l’environnement Edge computing. La gestion de la cohérence a des conséquences majeures pour les systèmes de stockage distribués. Les modèles de cohérence forte nécessitent une synchronisation après chaque mise à jour, ce qui affecte considérablement les performances et la disponibilité du système. À l’inverse, les modèles à faible cohérence offrent de meilleures performances ainsi qu’une meilleure disponibilité des données. Cependant, ces derniers modèles peuvent tolérer trop d’incohérences temporaires sous certaines conditions. Par conséquent, une stratégie de cohérence adaptative est nécessaire pour ajuster, pendant l’exécution, le niveau de cohérence en fonction de la criticité des requêtes ou des données. Cette thèse apporte deux contributions. Dans la première contribution, une analyse comparative des approches de cohérence adaptative existantes est effectuée selon un ensemble de critères de comparaison définis. Ce type de synthèse fournit à l’utilisateur/chercheur une analyse comparative des performances des approches existantes. De plus, il clarifie la pertinence de ces approches pour les systèmes cloud candidats. Dans la seconde contribution, nous proposons MinidoteACE, un nouveau système adaptatif de cohérence qui est une version améliorée de Minidote, un système de cohérence causale pour les applications Edge. Contrairement à Minidote qui ne fournit que la cohérence causale, notre modèle permet aux applications d’exécuter également des requêtes avec des garanties de cohérence plus fortes. Des évaluations expérimentales montrent que le débit ne diminue que de 3,5 % à 10 % lors du remplacement d’une opération causale par une opération forte. Cependant, la latence de mise à jour augmente considérablement pour les opérations fortes jusqu’à trois fois pour une charge de travail où le taux des opérations de mise à jour est de 25 %

    Survey of NoSQL Database Engines for Big Data

    Cloud computing is a paradigm shift that provides computing over Internet. With growing outreach of Internet in the lives of people, everyday large volume of data is generated from different sources such as cellphones, electronic gadgets, e-commerce transactions, social media, and sensors. Eventually, the size of generated data is so large that it is also referred as Big Data. Companies harvesting business opportunities in digital world need to invest their budget and time to scale their IT infrastructure for the expansion of their businesses. The traditional relational databases have limitations in scaling for large Internet scale distributed systems. To store rapidly expanding high volume Big Data efficiently, NoSQL data stores have been developed as an alternative solution to the relational databases. The purpose of this thesis is to provide a holistic overview of different NoSQL data stores. We cover different fundamental principles supporting the NoSQL data store development. Many NoSQL data stores have specific and exclusive features and properties. They also differ in their architecture, data model, storage system, and fault tolerance abilities. This thesis describes different aspects of few NoSQL data stores in detail. The thesis also covers the experiments to evaluate and compare the performance of different NoSQL data stores on a distributed cluster. In the scope of this thesis, HBase, Cassandra, MongoDB, and Riak are four NoSQL data stores selected for the benchmarking experiments

    Transactions and data management in NoSQL cloud databases

    NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance

    Benchmarking Eventually Consistent Distributed Storage Systems

    Cloud storage services and NoSQL systems typically offer only "Eventual Consistency", a rather weak guarantee covering a broad range of potential data consistency behavior. The degree of actual (in-)consistency, however, is unknown. This work presents novel solutions for determining the degree of (in-)consistency via simulation and benchmarking, as well as the necessary means to resolve inconsistencies leveraging this information
