13 research outputs found

    Cost- and workload-driven data management in the cloud

    Get PDF
    This thesis deals with the challenge of finding the right balance between consistency, availability, latency and costs, captured by the CAP/PACELC trade-offs, in the context of distributed data management in the Cloud. At the core of this work, cost and workload-driven data management protocols, called CCQ protocols, are developed. First, this includes the development of C3, which is an adaptive consistency protocol that is able to adjust consistency at runtime by considering consistency and inconsistency costs. Second, the development of Cumulus, an adaptive data partitioning protocol, that can adapt partitions by considering the application workload so that expensive distributed transactions are minimized or avoided. And third, the development of QuAD, a quorum-based replication protocol, that constructs the quorums in such a way so that, given a set of constraints, the best possible performance is achieved. The behavior of each CCQ protocol is steered by a cost model, which aims at reducing the costs and overhead for providing the desired data management guarantees. The CCQ protocols are able to continuously assess their behavior, and if necessary to adapt the behavior at runtime based on application workload and the cost model. This property is crucial for applications deployed in the Cloud, as they are characterized by a highly dynamic workload, and high scalability and availability demands. The dynamic adaptation of the behavior at runtime does not come for free, and may generate considerable overhead that might outweigh the gain of adaptation. The CCQ cost models incorporate a control mechanism, which aims at avoiding expensive and unnecessary adaptations, which do not provide any benefits to applications. The adaptation is a distributed activity that requires coordination between the sites in a distributed database system. The CCQ protocols implement safe online adaptation approaches, which exploit the properties of 2PC and 2PL to ensure that all sites behave in accordance with the cost model, even in the presence of arbitrary failures. It is crucial to guarantee a globally consistent view of the behavior, as in contrary the effects of the cost models are nullified. The presented protocols are implemented as part of a prototypical database system. Their modular architecture allows for a seamless extension of the optimization capabilities at any level of their implementation. Finally, the protocols are quantitatively evaluated in a series of experiments executed in a real Cloud environment. The results show their feasibility and ability to reduce application costs, and to dynamically adjust the behavior at runtime without violating their correctness

    Comparison of Eager and Quorum-based Replication in a Cloud Environment

    Get PDF
    Most applications deployed in a Cloud require a high degree of availability. For the data layer, this means that data have to be replicated either within a data center or across Cloud data centers. While replication also allows to increase the performance of applications if data is read as the load can be distributed across replica sites, updates need special coordination among the sites and may have an adverse effect on the overall performance. The actual effects of data replication depend on the replication protocol used. While ROWAA (readone-write-all-available) prefers read operations, quorum-based replication protocols tend to prefer write operations as not all replica sites need to be updated synchronously. In this paper, we provide a detailed evaluation of ROWAA and quorum-based replication protocols in an amazon AWS Cloud environment on the basis of the TPC-C benchmark and different transaction mixes. The evaluation results for single data center and multi data center environments show that in general the influence of transaction coordination significantly grows with the number of update sites and a growing number of update transactions. However, not all quorum-based protocols are well suited for high update loads as they may create a hot spot that again significantly impacts performance

    Analyzing the Performance of Data Replication and Data Partitioning in the Cloud: the Beowulf Approach

    Get PDF
    Applications deployed in the Cloud usually come with dedicated performance and availability requirements. This can be achieved by replicating data across several sites and/or by partitioning data. Data replication allows to parallelize read requests and thus to decrease data access latency, but induces significant overhead for the synchronization of updates. Partitioning, in contrast, is highly beneficial if all the data accessed by an application is located at the same site, but again necessitates coordination if distributed transactions are needed to serve applications. In this paper, we analyze three protocols for distributed data management in the Cloud, namely Read-One Write-All-Available (ROWAA), Majority Quorum (MQ) and Data Partitioning (DP) - all in a configuration that guarantees strong consistency. We introduce Beowulf, a meta protocol based on a comprehensive cost model that integrates the three protocols and that dynamically selects the protocol with the lowest latency for a given workload. In the evaluation, we compare the prediction of the Beowulf cost model with a baseline evaluation. The results nicely show the effectiveness of the analytical model and the precision in selecting the best suited protocol for a given workload

    QuAD: A Quorum Protocol for Adaptive Data Management in the Cloud

    Get PDF
    More and more companies move their data to the Cloud which is able to cope with the high scalability and availability demands due to its pay-as-you-go cost model. For this, databases in the Cloud are distributed and replicated across different data centers. According to the CAP theorem, distributed data management is governed by a trade-off between consistency and availability. In addition, the stronger the provided consistency level, the higher is the generated coordination overhead and thus the impact on system performance. Nevertheless, many OLTP applications demand strong consistency and use ROWA(A) for replica synchronization. ROWA(A) protocols eagerly update all (or all available) replicas and thus generate a high overhead for update transactions. In contrast, quorum-based protocols consider only a subset of sites for eager commit. This reduces the overhead for update transactions at the cost of reads, as the latter also need to access several sites. Existing quorum-based protocols do not consider the load of sites when determining the quorums; hence, they are not able to adapt at run-time to load changes. In this paper, we present QuAD, an adaptive quorum-based replication protocol that constructs quorums by dynamically selecting the optimal quorum configuration w.r.t. load and network latency. Our evaluation of QuAD based on Amazon EC2 shows that it considerably outperforms both static quorum protocols and dynamic protocols that neglect site properties in the quorum construction process

    SO-1SR: towards a self-optimizing one-copy serializability protocol for data management in the Cloud

    No full text
    Clouds are very attractive environments for deploying different types of applications due to their pay-as-you-go cost model and their highly available and scalable infrastructure. Data management is an integral part of the applications deployed in the Cloud. Thus, it is of utmost importance to provide highly available and scalable data management solutions tailored to the needs of the Cloud. Data availability can be increased by using well-known replication techniques. Data replication also increases scalability in case of read-only transactions, but generates a considerable overhead for keeping the replicas consistent in case of update transactions. In order to meet the scalability demands of their customers, current Cloud providers use DBMSs that only support weak data consistency.  While weak consistency is considered to be sufficient for many of the currently deployed applications in the Cloud, more and more applications with strong consistency guarantees, like traditional online stores, are moved to the Cloud. In the presence of replicated data, these applications require one-copy serializability (1SR). Hence, in order to exploit the advantages of the Cloud also for these applications, it is essential to provide scalable, available, low-cost, and strongly consistent data management, which is able to adapt dynamically based on application and system conditions. In this paper, we present SO-1SR (self-optimizing 1SR), a novel customizable load balancing approach to transaction execution on top of replicated data in the Cloud which is able to efficiently use existing resources and to optimize transaction execution in an adaptive and dynamic manner without a dedicated load balancing component.  The evaluation of SO-1SR on the basis of the TPC-C benchmark in the AWS Cloud environment has shown that the SO-1SR load balancer is much more efficient compared to existing load balancing techniques

    Cost-based adaptive concurrency control in the cloud

    No full text
    The recent advent of Cloud computing has strongly influenced thedeployment of large-scale applications. Currently, Cloud environmentsmainly focus on high scalability and availability of these applications. Consistency, in contrast, is relaxed and weak consistency is commonlyconsidered to be sufficient. However, an increasing number of applications are no longer satisfied with weak consistency. Strong consistency, in turn, decreases availability and is costly to enforce from both a performance and infrastructure point of view. On the other hand, weak consistency may lead to high costs due to the access to inconsistent data. In this technical report, we introduce a novel approach called cost based concurrency control (C3). Essentially, C3 allows to dynamically and adaptively switch at runtime between different consistency levels of transactions in a Cloud environment based on the costs incurring during execution. These costs are determined by infrastructure costs for running a transaction in a certain consistency level (called consistency costs) and, optionally, by additional application-specific costs for compensating the effects of accessing inconsistent data (called inconsistency costs). C3 jointly supports transactions of different consistency levels and enforces the inherent consistency guarantees of each protocol. We first analyze the consistency costs of concurrency control protocols; second, we specify a set of rules that allow to dynamically select the best consistency level with the goal of minimizing the overall costs; third, we provide a protocol that enforces the correct execution of all transactions in a transaction mix. We have evaluated C3 on top of amazon’s EC2. The results show that C3 leads to a reduction of the overall transaction costs compared to a fixed consistency level

    Cost-based data consistency in a data-as-a-service cloud environment

    No full text
    Clouds are becoming the preferred platforms for large-scale applications. Currently, Cloud environments focus on high scalability and availability by relaxing consistency. Weak consistency is considered to be sufficient for most of the currently deployed applications in the Cloud. However, the Cloud is increasingly being promoted as environment for running a wide range of different types of applications on top of replicated data — of which not all will be satisfied with weak consistency. Strong consistency, even though demanded by applications, decreases availability and is costly to enforce from both a performance and monetary point of view. On the other hand, weak consistency may generate high costs due to the access to inconsistent data. In this paper, we present a novel approach, called cost-based concurrency control (C3), that allows to dynamically and adaptively switch at runtime between different consistency levels of transactions. C3 has been implemented in a Data-as-a-Service Cloud environment and considers all costs that incur during  execution. These costs are determined by infrastructure costs for running a transaction in a certain consistency level (called consistency costs) and, optionally, by additional application-specific costs for compensating the effects of accessing inconsistent data (called inconsistency costs). C3 considers transaction mixes running different consistency levels at the same time while enforcing the inherent consistency guarantees of each of these protocols. The main contribution of this paper is threefold. First, it thoroughly analyzes the  consistency costs of the most common concurrency control protocols; second, it specifies a set of rules that allow to dynamically select the most appropriate consistency level with the goal of minimizing the overall costs (consistency and inconsistency costs); third, it provides a protocol that guarantees that anomalies in the transaction mixes supported by C3 are avoided and that enforces the correct execution of all transactions in a transaction mix. We have evaluated C3 on the basis of real infrastructure costs, derived from Amazon’s EC2. The results demonstrate the feasibility of the cost model and show that C3 leads to a reduction of the overall costs of transactions compared to afixed consistency level

    Workload-Driven Adaptive Data Partitioning and Distribution – The Cumulus Approach

    No full text
    Cloud environments usually feature several geographically distributed data centers. In order to increase the scalability of applications, many Cloud providers partition data and distribute these partitions across data centers to balance the load. However, if the partitions are not carefully chosen, it might lead to distributed transactions. This is particularly expensive when applications require strong consistency guarantees. The additional synchronization needed for atomic commitment would strongly impact transaction throughput and could even completely undo the gain that can be achieved by load balancing. Hence, it is beneficial to avoid distributed transactions as much as possible by partitioning the data in such a way that transactions can be executed locally. As access patterns of characteristic transaction workloads may change over time, the partitioning also needs to be dynamically updated. In this paper we introduce Cumulus, an adaptive data partitioning approach which is able to identify characteristic access patterns of transaction mixes, to determine data partitions based on these patterns, and to dynamically re-partition data if the access patterns change. In the evaluation based on the TPC-C benchmark, we show that Cumulus significantly increases the overall system performance in an OLTP setting compared to static data partitioning approaches. Moreover, we show that Cumulus is able to adapt to workload shifts at runtime by generating partitions that match the actual workload and to re-configure the system on the fly

    SLA-basierte Konfiguration eines modularen Datenbanksystems fĂĽr die Cloud

    No full text
    Die Popularität des Cloud Computing hat dazu geführt, dass viele Unternehmen ihre Anwendungen nicht mehr selbst mit eigenen Ressourcen betreiben. Diese Anwendungen laufen vielmehr komplett „in der Cloud“. Da die Datenverwaltung ein wesentlicher Teil dieser Anwendungen ist, werden Cloud-Anbieter mit vielen unter-schiedlichen Anforderungen an die Speicherung von und den Zugriff auf Daten konfrontiert. Daher müssen Cloud-Anbieter auch entsprechend viele verschiedene Varianten für die Verwaltung von Daten bereitstellen. Diese Varianten unterscheiden sich dabei nicht nur in den technischen Eigenschaften (z.B. Datenkonsistenz, Verfügbarkeit oder Antwortzeit), sondern auch in den Kosten für die benötigte Infrastruktur, die dafür anfallen. Zukünftige Cloud-Lösungen sollten daher nicht nur Einzellösungen oder wenige vorgegebene Konfigurationen anbieten, sondern aus konfigurierbaren Modulen und Protokollen bestehen, die dynamisch, je nach Anforderungen der Nutzer, kombiniert werden können. Damit kann eine grösstmögliche Flexibilität erreicht werden, um gleichzeitig möglichst viele heterogene Anforderungen von Cloud-Nutzern zu befriedigen. Während Module die Bausteine eines solchen Systems darstellen, beschreiben die Protokolle das gewünschte Verhalten dieser Bausteine. Eine grosse Herausforderung ist die Auswahl der geeigneten Module und Protokolle, deren Konfiguration und dynamische Anpassung an sich verändernde Anforderungen

    PolarDBMS: Towards a Cost-Effective and Policy-Based Data Management in the Cloud

    No full text
    corecore