Search CORE

19 research outputs found

Harmony: Towards Automated Self-Adaptive Consistency in Cloud Storage

Author: Antoniu Gabriel
Chihoub Houssem-Eddine
Ibrahim Shadi
Pérez María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/09/2012
Field of study

International audienceIn just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid'5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency modelin Cassandra

INRIA a CCSD electronic archive server

Consistency in the Cloud:When Money Does Matter!

Author: Antoniu Gabriel
Chihoub Houssem-Eddine
Ibrahim Shadi
Pérez María
Publication venue: HAL CCSD
Publication date: 22/11/2012
Field of study

With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and highly available services. To meet ever growing user needs, these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability. However, with replication, consistency comes into question. Service providers in the cloud have the freedom to select the level of consistency according to the access patterns exhibited by the applications.Most optimizations efforts then concentrate on how to provide adequate trade-offs between between consistency guarantees and performance. However, as the monetary cost completely relies on the service providers, in this paper we argue that monetary cost should be taken into consideration when evaluating or selecting a consistency level in the cloud. Accordingly, we define a new metric called consistency-cost efficiency. Based on this metric, we present a simple, yet efficient economical consistency model, called Bismar, that adaptively tunes the consistency level at run-time in order to reduce the monetary cost while simultaneously maintaining a low fraction of stale reads. Experimental evaluations with the Cassandra cloud storage on a Grid'5000 testbed show the validity of the metric and demonstrate the effectiveness of the proposed consistency model

INRIA a CCSD electronic archive server

Harmony: Towards automated self-adaptive consistency in cloud storage

Author: Antoniu Gabriel
Chihoub Houssem-Eddine
Ibrahim Shadi
Pérez Hernández María de los Santos
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

In just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid?5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency model in Cassandra

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Archivo Digital UPM

HAL-Rennes 1

Consistency Management in Cloud Storage Systems

Author: Gabriel Antoniu
Houssem-Eddine Chihoub
Ibrahim ♦ Shadi
María S Pérez
Publication venue
Publication date: 03/04/2020
Field of study

CiteSeerX

Exploring Energy-Consistency Trade-offs in Cassandra Cloud Storage System

Author: Antoniu Gabriel
Bougé Luc
Chihoub Houssem-Eddine
Ibrahim Shadi
Li Yue
Pérez María
Publication venue: HAL CCSD
Publication date: 18/10/2015
Field of study

International audienceApache Cassandra is an open-source cloud storage system that offers multiple types of operation-level consistency including eventual consistency with multiple levels of guarantees and strong consistency. It is being used by many data-center applications (e.g., Facebook and AppScale). Most existing research efforts have been dedicated to exploring trade-offs such as: consistency vs. performance, consistency vs. latency and consistency vs. monetary cost. In contrast, a little work is focusing on the consistency vs. energy trade-off. As power bills have become a substantial part of the monetary cost for operating a data-center, this paper aims to provide a clearer understanding of the interplay between consistency and energy consumption. Accordingly, a series of experiments have been conducted to explore the implication of different factors on the energy consumption in Cassandra. Our experiments have revealed a noticeable variation in the energy consumption depending on the consistency level. Furthermore, for a given consistency level, the energy consumption of Cassandra varies with the access pattern and the load exhibited by the application. This further analysis indicates that the uneven distribution of the load amongst different nodes also impacts the energy consumption in Cassandra. Finally, we experimentally compare the impact of four storage configuration and data partitioning policies on the energy consumption in Cassandra: interestingly, we achieve 23% energy saving when assigning 50% of the nodes to the hot pool for the applications with moderate ratio of reads and writes, while applying eventual (quorum) consistency. This study points to opportunities for future research on consistency-energy trade-offs and offers useful insight into designing energy-efficient techniques for cloud storage systems

INRIA a CCSD electronic archive server

Self-Adaptive Cost-Efficient Consistency Management in the Cloud

Author: Chihoub Houssem-Eddine
Publication venue: HAL CCSD
Publication date: 20/05/2013
Field of study

International audienceMany data-intensive applications and services in the cloud are geo-distributed and rely on geo-replication. Traditional synchronous replication that ensures strong consistency exposes these systems to the bottleneck of wide areas network latencies that affect their performance, availability and the monetary cost of running in the cloud. In this context, several weaker consistency models were introduced to hide such effects. However, these solutions may tolerate far too much stale data to be read. In this PhD research, we focus on the investigation of better and efficient ways to manage consistency. We propose self-adaptive methods that tune consistency levels at runtime in order to achieve better performance, availability and reduce the monetary cost without violating the consistency requirements of the application. Furthermore, we introduce a behavior modeling method that automatically analyzes the application and learns its consistency requirements. The set of experimental evaluations on Grid'5000 and Amazon EC2 cloud platforms show the effectiveness of the proposed approaches

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Gérer la cohérence pour les applications big data : compromis et auto-adaptabilité

Author: Chihoub Houssem Eddine
Publication venue: HAL CCSD
Publication date: 10/12/2013
Field of study

In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency.Dans l’ère de Big Data, les applications intensives en données gèrent des volumes de données extrêmement grand. De plus, ils ont besoin de temps de traitement rapide. Une grande partie de ces applications sont déployées sur des infrastructures cloud. Ceci est afin de bénéficier de l’élasticité des clouds, les déploiements sur demande et les coûts réduits strictement relatifs à l’usage. Dans ce contexte, la réplication est un moyen essentiel dans le cloud afin de surmonter les défis de Big Data. En effet, la réplication fournit les moyens pour assurer la disponibilité des données à travers de nombreuses copies de données, des accès plus rapide aux copies locales, la tolérance aux fautes. Cependant, la réplication introduit le problème majeur de la cohérence de données. La gestion de la cohérence est primordiale pour les systèmes de Big Data. Les modèles à cohérence forte présentent de grandes limitations aux aspects liées aux performances et au passage à l’échelle à cause des besoins de synchronisation. En revanche, les modèles à cohérence faible et éventuelle promettent de meilleures performances ainsi qu’une meilleure disponibilité de données. Toutefois, ces derniers modèles peuvent tolérer, sous certaines conditions, trop d’incohérence temporelle. Dans le cadre du travail de cette thèse, on s'adresse particulièrement aux problèmes liés aux compromis de cohérence dans les systèmes à large échelle de Big Data. Premièrement, on étudie la gestion de cohérence au niveau du système de stockage. On introduit un modèle de cohérence auto-adaptative (nommé Harmony). Ce modèle augmente et diminue de manière automatique le niveau de cohérence et le nombre de copies impliquées dans les opérations. Ceci permet de fournir de meilleures performances toute en satisfaisant les besoins de cohérence de l’application. De plus, on introduit une étude détaillée sur l'impact de la gestion de la cohérence sur le coût financier dans le cloud. On emploi cette étude afin de proposer une gestion de cohérence efficace qui réduit les coûts. Dans une troisième direction, on étudie les effets de gestion de cohérence sur la consommation en énergie des systèmes de stockage distribués. Cette étude nous mène à analyser les gains potentiels des reconfigurations adaptatives des systèmes de stockage en matière de réduction de la consommation. Afin de compléter notre travail au niveau système de stockage, on s'adresse à la gestion de cohérence au niveau de l’application. Les applications de Big Data sont de nature différente et ont des besoins de cohérence différents. Par conséquent, on introduit une approche de modélisation du comportement de l’application lors de ses accès aux données. Le modèle résultant facilite la compréhension des besoins en cohérence. De plus, ce modèle est utilisé afin de délivrer une cohérence customisée spécifique à l’application

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

Theses.fr

HAL-Rennes 1

iBig Hybrid Architecture for Energy IoT: When the power of Indexing meets Big Data Processing!

Author: Chihoub Houssem-Eddine
Collet Christine
Publication venue: HAL CCSD
Publication date: 11/12/2017
Field of study

International audienceNowadays, IoT data come from multiple sources and a large number of devices. To manage them, IoT frameworks rely on Big Data ecosystems hosted in the cloud to provide scalable storage systems and to achieve scalable processing. Although these ecosystems scale well to process large sizes of data, in many cases this is done naively. Many datasets, such as IoT Energy measurement data, consist, even partially, of attributes that can be indexed to avoid unnecessary and costly data scan at these scales. In this work, we propose the iBig architecture that provides secondary indexing to Big Data processing for energy IoT datasets. Indexes are considered as metadata stored in a separate component integrated in the ecosystem. Subsequently, MapReduce-based and MPP (massively parallel processing)-based processing leverage indexing to handle only relevant data in a given dataset. Our experimental evaluation on the Grid5000 cloud testbed demonstrates that performance gains can exceed 98% for the MapReduce-based Spark and 81% for the MPP-based Drill for Energy IoT Data. Furthermore, we provide comparison insights about the Spark and Drill frameworks when processing the whole dataset or only with relevant data

Hal - Université Grenoble Alpes

Metadata based datasets placement in Smart grids

Author: Chihoub Houssem-Eddine
Collet Christine
Zgolli Asma
Publication venue: HAL CCSD
Publication date: 26/10/2018
Field of study

article de 4 pages présenté sous forme de poster à la conference internationale MTSR 2018 (cyprus

Hal - Université Grenoble Alpes

Towards a scalable, fault-tolerant, self-adaptive storage for the clouds

Author: Antoniu Gabriel
Chihoub Houssem-Eddine
Pérez María
Publication venue: HAL CCSD
Publication date: 10/04/2011
Field of study

International audienceThe emerging model of cloud computing to manage distributed large-scale infrastructure resources is becoming very popular. Entities in both industry and academia are investing a huge time and effort to investigate and develop its features. As more and more applications are becoming data-intensive, one of the most important issues for this promising model is data management and to efficiently handle storage. Cloud storage needs are basically of two kinds, storage support for Virtual Machine (VMs) images and storage support for cloud applications data. In this paper, we propose a cloud storage architecture for Both VMs images and application data. This architecture is designed to allow the system to be self-adaptive and to provide better scalability with a good quality of service. This is achieved by modeling the global behavior of the underlying storage system and acting accordingly. Our design is going to be implemented to provide storage service for OpenNebula cloud platform

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1