19 research outputs found

    Harmony: Towards Automated Self-Adaptive Consistency in Cloud Storage

    Get PDF
    International audienceIn just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid'5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency modelin Cassandra

    Consistency in the Cloud:When Money Does Matter!

    Get PDF
    With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and highly available services. To meet ever growing user needs, these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability. However, with replication, consistency comes into question. Service providers in the cloud have the freedom to select the level of consistency according to the access patterns exhibited by the applications.Most optimizations efforts then concentrate on how to provide adequate trade-offs between between consistency guarantees and performance. However, as the monetary cost completely relies on the service providers, in this paper we argue that monetary cost should be taken into consideration when evaluating or selecting a consistency level in the cloud. Accordingly, we define a new metric called consistency-cost efficiency. Based on this metric, we present a simple, yet efficient economical consistency model, called Bismar, that adaptively tunes the consistency level at run-time in order to reduce the monetary cost while simultaneously maintaining a low fraction of stale reads. Experimental evaluations with the Cassandra cloud storage on a Grid'5000 testbed show the validity of the metric and demonstrate the effectiveness of the proposed consistency model

    Harmony: Towards automated self-adaptive consistency in cloud storage

    Get PDF
    In just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid?5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency model in Cassandra

    Exploring Energy-Consistency Trade-offs in Cassandra Cloud Storage System

    Get PDF
    International audienceApache Cassandra is an open-source cloud storage system that offers multiple types of operation-level consistency including eventual consistency with multiple levels of guarantees and strong consistency. It is being used by many data-center applications (e.g., Facebook and AppScale). Most existing research efforts have been dedicated to exploring trade-offs such as: consistency vs. performance, consistency vs. latency and consistency vs. monetary cost. In contrast, a little work is focusing on the consistency vs. energy trade-off. As power bills have become a substantial part of the monetary cost for operating a data-center, this paper aims to provide a clearer understanding of the interplay between consistency and energy consumption. Accordingly, a series of experiments have been conducted to explore the implication of different factors on the energy consumption in Cassandra. Our experiments have revealed a noticeable variation in the energy consumption depending on the consistency level. Furthermore, for a given consistency level, the energy consumption of Cassandra varies with the access pattern and the load exhibited by the application. This further analysis indicates that the uneven distribution of the load amongst different nodes also impacts the energy consumption in Cassandra. Finally, we experimentally compare the impact of four storage configuration and data partitioning policies on the energy consumption in Cassandra: interestingly, we achieve 23% energy saving when assigning 50% of the nodes to the hot pool for the applications with moderate ratio of reads and writes, while applying eventual (quorum) consistency. This study points to opportunities for future research on consistency-energy trade-offs and offers useful insight into designing energy-efficient techniques for cloud storage systems

    Self-Adaptive Cost-Efficient Consistency Management in the Cloud

    Get PDF
    International audienceMany data-intensive applications and services in the cloud are geo-distributed and rely on geo-replication. Traditional synchronous replication that ensures strong consistency exposes these systems to the bottleneck of wide areas network latencies that affect their performance, availability and the monetary cost of running in the cloud. In this context, several weaker consistency models were introduced to hide such effects. However, these solutions may tolerate far too much stale data to be read. In this PhD research, we focus on the investigation of better and efficient ways to manage consistency. We propose self-adaptive methods that tune consistency levels at runtime in order to achieve better performance, availability and reduce the monetary cost without violating the consistency requirements of the application. Furthermore, we introduce a behavior modeling method that automatically analyzes the application and learns its consistency requirements. The set of experimental evaluations on Grid'5000 and Amazon EC2 cloud platforms show the effectiveness of the proposed approaches

    Gérer la cohérence pour les applications big data : compromis et auto-adaptabilité

    No full text
    In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency.Dans l’ùre de Big Data, les applications intensives en donnĂ©es gĂšrent des volumes de donnĂ©es extrĂȘmement grand. De plus, ils ont besoin de temps de traitement rapide. Une grande partie de ces applications sont dĂ©ployĂ©es sur des infrastructures cloud. Ceci est afin de bĂ©nĂ©ficier de l’élasticitĂ© des clouds, les dĂ©ploiements sur demande et les coĂ»ts rĂ©duits strictement relatifs Ă  l’usage. Dans ce contexte, la rĂ©plication est un moyen essentiel dans le cloud afin de surmonter les dĂ©fis de Big Data. En effet, la rĂ©plication fournit les moyens pour assurer la disponibilitĂ© des donnĂ©es Ă  travers de nombreuses copies de donnĂ©es, des accĂšs plus rapide aux copies locales, la tolĂ©rance aux fautes. Cependant, la rĂ©plication introduit le problĂšme majeur de la cohĂ©rence de donnĂ©es. La gestion de la cohĂ©rence est primordiale pour les systĂšmes de Big Data. Les modĂšles Ă  cohĂ©rence forte prĂ©sentent de grandes limitations aux aspects liĂ©es aux performances et au passage Ă  l’échelle Ă  cause des besoins de synchronisation. En revanche, les modĂšles Ă  cohĂ©rence faible et Ă©ventuelle promettent de meilleures performances ainsi qu’une meilleure disponibilitĂ© de donnĂ©es. Toutefois, ces derniers modĂšles peuvent tolĂ©rer, sous certaines conditions, trop d’incohĂ©rence temporelle. Dans le cadre du travail de cette thĂšse, on s'adresse particuliĂšrement aux problĂšmes liĂ©s aux compromis de cohĂ©rence dans les systĂšmes Ă  large Ă©chelle de Big Data. PremiĂšrement, on Ă©tudie la gestion de cohĂ©rence au niveau du systĂšme de stockage. On introduit un modĂšle de cohĂ©rence auto-adaptative (nommĂ© Harmony). Ce modĂšle augmente et diminue de maniĂšre automatique le niveau de cohĂ©rence et le nombre de copies impliquĂ©es dans les opĂ©rations. Ceci permet de fournir de meilleures performances toute en satisfaisant les besoins de cohĂ©rence de l’application. De plus, on introduit une Ă©tude dĂ©taillĂ©e sur l'impact de la gestion de la cohĂ©rence sur le coĂ»t financier dans le cloud. On emploi cette Ă©tude afin de proposer une gestion de cohĂ©rence efficace qui rĂ©duit les coĂ»ts. Dans une troisiĂšme direction, on Ă©tudie les effets de gestion de cohĂ©rence sur la consommation en Ă©nergie des systĂšmes de stockage distribuĂ©s. Cette Ă©tude nous mĂšne Ă  analyser les gains potentiels des reconfigurations adaptatives des systĂšmes de stockage en matiĂšre de rĂ©duction de la consommation. Afin de complĂ©ter notre travail au niveau systĂšme de stockage, on s'adresse Ă  la gestion de cohĂ©rence au niveau de l’application. Les applications de Big Data sont de nature diffĂ©rente et ont des besoins de cohĂ©rence diffĂ©rents. Par consĂ©quent, on introduit une approche de modĂ©lisation du comportement de l’application lors de ses accĂšs aux donnĂ©es. Le modĂšle rĂ©sultant facilite la comprĂ©hension des besoins en cohĂ©rence. De plus, ce modĂšle est utilisĂ© afin de dĂ©livrer une cohĂ©rence customisĂ©e spĂ©cifique Ă  l’application

    iBig Hybrid Architecture for Energy IoT: When the power of Indexing meets Big Data Processing!

    No full text
    International audienceNowadays, IoT data come from multiple sources and a large number of devices. To manage them, IoT frameworks rely on Big Data ecosystems hosted in the cloud to provide scalable storage systems and to achieve scalable processing. Although these ecosystems scale well to process large sizes of data, in many cases this is done naively. Many datasets, such as IoT Energy measurement data, consist, even partially, of attributes that can be indexed to avoid unnecessary and costly data scan at these scales. In this work, we propose the iBig architecture that provides secondary indexing to Big Data processing for energy IoT datasets. Indexes are considered as metadata stored in a separate component integrated in the ecosystem. Subsequently, MapReduce-based and MPP (massively parallel processing)-based processing leverage indexing to handle only relevant data in a given dataset. Our experimental evaluation on the Grid5000 cloud testbed demonstrates that performance gains can exceed 98% for the MapReduce-based Spark and 81% for the MPP-based Drill for Energy IoT Data. Furthermore, we provide comparison insights about the Spark and Drill frameworks when processing the whole dataset or only with relevant data

    Metadata based datasets placement in Smart grids

    No full text
    article de 4 pages présenté sous forme de poster à la conference internationale MTSR 2018 (cyprus

    Towards a scalable, fault-tolerant, self-adaptive storage for the clouds

    Get PDF
    International audienceThe emerging model of cloud computing to manage distributed large-scale infrastructure resources is becoming very popular. Entities in both industry and academia are investing a huge time and effort to investigate and develop its features. As more and more applications are becoming data-intensive, one of the most important issues for this promising model is data management and to efficiently handle storage. Cloud storage needs are basically of two kinds, storage support for Virtual Machine (VMs) images and storage support for cloud applications data. In this paper, we propose a cloud storage architecture for Both VMs images and application data. This architecture is designed to allow the system to be self-adaptive and to provide better scalability with a good quality of service. This is achieved by modeling the global behavior of the underlying storage system and acting accordingly. Our design is going to be implemented to provide storage service for OpenNebula cloud platform
    corecore