352 research outputs found

    Comparison of Eager and Quorum-based Replication in a Cloud Environment

    Get PDF
    Most applications deployed in a Cloud require a high degree of availability. For the data layer, this means that data have to be replicated either within a data center or across Cloud data centers. While replication also allows to increase the performance of applications if data is read as the load can be distributed across replica sites, updates need special coordination among the sites and may have an adverse effect on the overall performance. The actual effects of data replication depend on the replication protocol used. While ROWAA (readone-write-all-available) prefers read operations, quorum-based replication protocols tend to prefer write operations as not all replica sites need to be updated synchronously. In this paper, we provide a detailed evaluation of ROWAA and quorum-based replication protocols in an amazon AWS Cloud environment on the basis of the TPC-C benchmark and different transaction mixes. The evaluation results for single data center and multi data center environments show that in general the influence of transaction coordination significantly grows with the number of update sites and a growing number of update transactions. However, not all quorum-based protocols are well suited for high update loads as they may create a hot spot that again significantly impacts performance

    Analyzing the Performance of Data Replication and Data Partitioning in the Cloud: the Beowulf Approach

    Get PDF
    Applications deployed in the Cloud usually come with dedicated performance and availability requirements. This can be achieved by replicating data across several sites and/or by partitioning data. Data replication allows to parallelize read requests and thus to decrease data access latency, but induces significant overhead for the synchronization of updates. Partitioning, in contrast, is highly beneficial if all the data accessed by an application is located at the same site, but again necessitates coordination if distributed transactions are needed to serve applications. In this paper, we analyze three protocols for distributed data management in the Cloud, namely Read-One Write-All-Available (ROWAA), Majority Quorum (MQ) and Data Partitioning (DP) - all in a configuration that guarantees strong consistency. We introduce Beowulf, a meta protocol based on a comprehensive cost model that integrates the three protocols and that dynamically selects the protocol with the lowest latency for a given workload. In the evaluation, we compare the prediction of the Beowulf cost model with a baseline evaluation. The results nicely show the effectiveness of the analytical model and the precision in selecting the best suited protocol for a given workload

    QuAD: A Quorum Protocol for Adaptive Data Management in the Cloud

    Get PDF
    More and more companies move their data to the Cloud which is able to cope with the high scalability and availability demands due to its pay-as-you-go cost model. For this, databases in the Cloud are distributed and replicated across different data centers. According to the CAP theorem, distributed data management is governed by a trade-off between consistency and availability. In addition, the stronger the provided consistency level, the higher is the generated coordination overhead and thus the impact on system performance. Nevertheless, many OLTP applications demand strong consistency and use ROWA(A) for replica synchronization. ROWA(A) protocols eagerly update all (or all available) replicas and thus generate a high overhead for update transactions. In contrast, quorum-based protocols consider only a subset of sites for eager commit. This reduces the overhead for update transactions at the cost of reads, as the latter also need to access several sites. Existing quorum-based protocols do not consider the load of sites when determining the quorums; hence, they are not able to adapt at run-time to load changes. In this paper, we present QuAD, an adaptive quorum-based replication protocol that constructs quorums by dynamically selecting the optimal quorum configuration w.r.t. load and network latency. Our evaluation of QuAD based on Amazon EC2 shows that it considerably outperforms both static quorum protocols and dynamic protocols that neglect site properties in the quorum construction process

    Key-CRDT stores

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe Internet has opened opportunities to create world scale services. These systems require highavailability and fault tolerance, while preserving low latency. Replication is a widely adopted technique to provide these properties. Different replication techniques have been proposed through the years, but to support these properties for world scale services it is necessary to trade consistency for availability, fault-tolerance and low latency. In weak consistency models, it is necessary to deal with possible conflicts arising from concurrent updates. We propose the use of conflict free replicated data types (CRDTs) to address this issue. Cloud computing systems support world scale services, often relying on Key-Value stores for storing data. These systems partition and replicate data over multiple nodes, that can be geographically disperse over the network. For handling conflict, these systems either rely on solutions that lose updates (e.g. last-write-wins) or require application to handle concurrent updates. Additionally, these systems provide little support for transactions, a widely used abstraction for data access. In this dissertation, we present the design and implementation of SwiftCloud, a Key-CRDT store that extends a Key-Value store by incorporating CRDTs in the system’s data-model. The system provides automatic conflict resolution relying on properties of CRDTs. We also present a version of SwiftCloud that supports transactions. Unlike traditional transactional systems, transactions never abort due to write/write conflicts, as the system leverages CRDT properties to merge concurrent transactions. For implementing SwiftCloud, we have introduced a set of new techniques, including versioned CRDTs, composition of CRDTs and alternative serialization methods. The evaluation of the system, with both micro-benchmarks and the TPC-W benchmark, shows that SwiftCloud imposes little overhead over a key-value store. Allowing clients to access a datacenter close to them with SwiftCloud, can reduce latency without requiring any complex reconciliation mechanism. The experience of using SwiftCloud has shown that adapting an existing application to use SwiftCloud requires low effort.Project PTDC/EIA-EIA/108963/200

    Transactions and data management in NoSQL cloud databases

    Get PDF
    NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    A novel causally consistent replication protocol with partial geo-replication

    Get PDF
    Distributed storage systems are a fundamental component of large-scale Internet services. To keep up with the increasing expectations of users regarding availability and latency, the design of data storage systems has evolved to achieve these properties, by exploiting techniques such as partial replication, geo-replication and weaker consistency models. While systems with these characteristics exist, they usually do not provide all these properties or do so in an inefficient manner, not taking full advantage of them. Additionally, weak consistency models, such as eventual consistency, put an excessively high burden on application programmers for writing correct applications, and hence, multiple systems have moved towards providing additional consistency guarantees such as implementing the causal (and causal+) consistency models. In this thesis we approach the existing challenges in designing a causally consistent replication protocol, with a focus on the use of geo and partial data replication. To this end, we present a novel replication protocol, capable of enriching an existing geo and partially replicated datastore with the causal+ consistency model. In addition, this thesis also presents a concrete implementation of the proposed protocol over the popular Cassandra datastore system. This implementation is complemented with experimental results obtained in a realistic scenario, in which we compare our proposal withmultiple configurations of the Cassandra datastore (without causal consistency guarantees) and with other existing alternatives. The results show that our proposed solution is able to achieve a balanced performance, with low data visibility delays and without significant performance penalties

    Eventual Consistent Databases: State of the Art

    Get PDF
    One of the challenges of cloud programming is to achieve the right balance between the availability and consistency in a distributed database. Cloud computing environments, particularly cloud databases, are rapidly increasing in importance, acceptance and usage in major applications, which need the partition-tolerance and availability for scalability purposes, but sacrifice the consistency side (CAP theorem). In these environments, the data accessed by users is stored in a highly available storage system, thus the use of paradigms such as eventual consistency became more widespread. In this paper, we review the state-of-the-art database systems using eventual consistency from both industry and research. Based on this review, we discuss the advantages and disadvantages of eventual consistency, and identify the future research challenges on the databases using eventual consistency

    LHView: Location Aware Hybrid Partial View

    Get PDF
    The rise of the Cloud creates enormous business opportunities for companies to provide global services, which requires applications supporting the operation of those services to scale while minimizing maintenance costs, either due to unnecessary allocation of resources or due to excessive human supervision and administration. Solutions designed to support such systems have tackled fundamental challenges from individual component failure to transient network partitions. A fundamental aspect that all scalable large systems have to deal with is the membership of the system, i.e, tracking the active components that compose the system. Most systems rely on membership management protocols that operate at the application level, many times exposing the interface of a logical overlay network, that should guarantee high scalability, efficiency, and robustness. Although these protocols are capable of repairing the overlay in face of large numbers of individual components faults, when scaling to global settings (i.e, geo-distributed scenarios), this robustness is a double edged-sword because it is extremely complex for a node in a system to distinguish between a set of simultaneously node failures and a (transient) network partition. Thus the occurrence of a network partition creates isolated sub-sets of nodes incapable of reconnecting even after the recovery from the partition. This work address this challenges by proposing a novel datacenter-aware membership protocol to tolerate network partitions by applying existing overlay management techniques and classification techniques that may allow the system to efficiently cope with such events without compromising the remaining properties of the overlay network. Furthermore, we strive to achieve these goals with a solution that requires minimal human intervention
    • …
    corecore