113 research outputs found
Technical debt-aware elasticity management in cloud computing environments
Elasticity is the characteristic of cloud computing that provides the underlying primitives to dynamically acquire and release shared computational resources on demand. Moreover, it unfolds the advantage of the economies of scale in the cloud, which refers to a drop in the average costs of these computing capacities as a result of the dynamic sharing capability. However, in practice, it is impossible to achieve elasticity adaptations that obtain perfect matches between resource supply and demand, which produces dynamic gaps at runtime. Moreover, elasticity is only a capability, and consequently it calls for a management process with far-sighted economics objectives to maximise the value of elasticity adaptations.
Within this context, we advocate the use of an economics-driven approach to guide elasticity managerial decisions. We draw inspiration from the technical debt metaphor in software engineering and we explore it in a dynamic setting to present a debt-aware elasticity management. In particular, we introduce a managerial approach that assesses the value of elasticity decisions to adapt the resource provisioning. Additionally, the approach pursues strategic decisions that value the potential utility produced by the unavoidable gaps between the ideal and actual resource provisioning over time. As part of experimentation, we built a proof of concept and the results indicate that value-oriented adaptations in elasticity management lead to a better economics performance in terms of lower operating costs and higher quality of service over time.
This thesis contributes (i) an economics-driven approach towards elasticity management; (ii) a technical debt-aware model to reason about elasticity adaptations; (iii) a debt-aware learning elasticity management approach; and (iv) a multi-agent elasticity management for multi-tenant applications hosted in the cloud
Recommended from our members
Transiency-driven Resource Management for Cloud Computing Platforms
Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator.
Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%.
This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements.
Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation
An Energy-Efficient Multi-Cloud Service Broker for Green Cloud Computing Environment
The heavy demands on cloud computing resources have led to a substantial growth in energy consumption of the data transferred between cloud computing parties (i.e., providers, datacentres, users, and services) and in datacentre’s services due to the increasing loads on these services. From one hand, routing and transferring large amounts of data into a datacentre located far from the user’s geographical location consume more energy than just processing and storing the same data on the cloud datacentre. On the other hand, when a cloud user submits a job (in the form of a set of functional and non-functional requirements) to a cloud service provider (aka, datacentre) via a cloud services broker; the broker becomes responsible to find the best-fit service to the user request based mainly on the user’s requirements and Quality of Service (QoS) (i.e., response time, latency). Hence, it becomes a high necessity to locate the lowest energy consumption route between the user and the designated datacentre; and the minimum possible number of most energy efficient services that satisfy the user request. In fact, finding the most energy-efficient route to the datacentre, and most energy efficient service(s) to the user are the biggest challenges of multi-cloud broker’s environment. This thesis presents and evaluates a novel multi-cloud broker solution that contains three innovative models and their associated algorithms. The first one is aimed at finding the most energy efficient route, among multiple possible routes, between the user and cloud datacentre. The second model is to find and provide the lowest possible number of most energy efficient services in order to minimise data exchange based on a bin-packing approach. The third model creates an energy-aware composition plan by integrating the most energy efficient services, in order to fulfil user requirements. The results demonstrated a favourable performance of these models in terms of selecting the most energy efficient route and reaching the least possible number of services for an optimum and energy efficient composition
Planning and Optimization During the Life-Cycle of Service Level Agreements for Cloud Computing
Ein Service Level Agreement (SLA) ist ein elektronischer Vertrag zwischen dem Kunden
und dem Anbieter eines Services. Die beteiligten Partner kl aren ihre Erwartungen
und Verp
ichtungen in Bezug auf den Dienst und dessen Qualit at. SLAs werden
bereits f ur die Beschreibung von Cloud-Computing-Diensten eingesetzt. Der
Diensteanbieter stellt sicher, dass die Dienstqualit at erf ullt wird und mit den Anforderungen
des Kunden bis zum Ende der vereinbarten Laufzeit ubereinstimmt.
Die Durchf uhrung der SLAs erfordert einen erheblichen Aufwand, um Autonomie,
Wirtschaftlichkeit und E zienz zu erreichen. Der gegenw artige Stand der Technik
im SLA-Management begegnet Herausforderungen wie SLA-Darstellung f ur Cloud-
Dienste, gesch aftsbezogene SLA-Optimierungen, Dienste-Outsourcing und Ressourcenmanagement.
Diese Gebiete scha en zentrale und aktuelle Forschungsthemen. Das
Management von SLAs in unterschiedlichen Phasen w ahrend ihrer Laufzeit erfordert
eine daf ur entwickelte Methodik. Dadurch wird die Realisierung von Cloud SLAManagement
vereinfacht.
Ich pr asentiere ein breit gef achertes Modell im SLA-Laufzeitmanagement, das die
genannten Herausforderungen adressiert. Diese Herangehensweise erm oglicht eine automatische
Dienstemodellierung, sowie Aushandlung, Bereitstellung und Monitoring
von SLAs. W ahrend der Erstellungsphase skizziere ich, wie die Modellierungsstrukturen
verbessert und vereinfacht werden k onnen. Ein weiteres Ziel von meinem Ansatz
ist die Minimierung von Implementierungs- und Outsourcingkosten zugunsten von
Wettbewerbsf ahigkeit. In der SLA-Monitoringphase entwickle ich Strategien f ur die
Auswahl und Zuweisung von virtuellen Cloud Ressourcen in Migrationsphasen. Anschlie
end pr ufe ich mittels Monitoring eine gr o ere Zusammenstellung von SLAs, ob
die vereinbarten Fehlertoleranzen eingehalten werden.
Die vorliegende Arbeit leistet einen Beitrag zu einem Entwurf der GWDG und
deren wissenschaftlichen Communities. Die Forschung, die zu dieser Doktorarbeit
gef uhrt hat, wurde als Teil von dem SLA@SOI EU/FP7 integriertem Projekt durchgef
uhrt (contract No. 216556)
Virtual machine scheduling in dedicated computing clusters
Time-critical applications process a continuous stream of input data and have to meet specific timing constraints. A common approach to ensure that such an application satisfies its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every specified data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in flexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-specific interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modified according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed
Virtual machine scheduling in dedicated computing clusters
Time-critical applications process a continuous stream of input data and have to meet specific timing constraints. A common approach to ensure that such an application satisfies its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every specified data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in flexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-specific interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modified according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed
Anti load-balancing for energy-aware distributed scheduling of virtual machines
La multiplication de l'informatique en nuage (Cloud) a abouti à la création de centres de données dans le monde entier. Le Cloud contient des milliers de nœuds de calcul. Cependant, les centres de données consomment d'énorme quantités d'énergie à travers le monde estimées à plus de 1,5 % de la consommation mondiale d'électricité et devrait continuer à croître. Une problématique habituellement étudiée dans les systèmes distribués est de répartir équitablement la charge. Mais lorsque l'objectif est de réduire la consommation électrique, ce type d'algorithmes peut mener à avoir des serveurs fortement sous chargés et donc à consommer de l'énergie inutilement. Cette thèse présente de nouvelles techniques, des algorithmes et des logiciels pour la consolidation dynamique et distribuée de machines virtuelles (VM) dans le Cloud. L'objectif principal de cette thèse est de proposer des stratégies d'ordonnancement tenant compte de l'énergie dans le Cloud pour les économies d'énergie. Pour atteindre cet objectif, nous utilisons des approches centralisées et décentralisées. Les contributions à ce niveau méthodologique sont présentées sur ces deux axes. L'objectif de notre démarche est de réduire la consommation de l'énergie totale du centre de données en contrôlant la consommation globale d'énergie des applications tout en assurant les contrats de service pour l'exécution des applications. La consommation d'énergie est réduite en désactivant et réactivant dynamiquement les nœuds physiques pour répondre à la demande des ressources. Les principales contributions sont les suivantes: - Ici on s'intéressera à la problématique contraire de l'équilibrage de charge. Il s'agit d'une technique appelée Anti Load-Balancing pour concentrer la charge sur un nombre minimal de nœuds. Le but est de pouvoir éteindre les nœuds libérés et donc de minimiser la consommation énergétique du système. - Ensuite une approche centralisée a été proposée et fonctionne en associant une valeur de crédit à chaque nœud. Le crédit d'un nœud dépend de son affinité pour ses tâches, sa charge de travail actuelle et sa façon d'effectuer ses communications. Les économies d'énergie sont atteintes par la consolidation continue des machines virtuelles en fonction de l'utilisation actuelle des ressources, les topologies de réseaux virtuels établis entre les machines virtuelles et l'état thermique de nœuds de calcul. Les résultats de l'expérience sur une extension de CloudSim (EnerSim) montrent que l'énergie consommée par les applications du Cloud et l'efficacité énergétique ont été améliorées. - Le troisième axe est consacré à l'examen d'une approche appelée "Cooperative scheduling Anti load-balancing Algorithm for cloud". Il s'agit d'une approche décentralisée permettant la coopération entre les différents sites. Pour valider cet algorithme, nous avons étendu le simulateur MaGateSim. Avec une large évaluation expérimentale d'un ensemble de données réelles, nous
sommes arrivés à la conclusion que l'approche à la fois en utilisant des algorithmes centralisés et décentralisés peut réduire l'énergie consommée des centres de données.The multiplication of Cloud computing has resulted in the establishment of largescale data centers around the world containing thousands of compute nodes. However, Cloud consume huge amounts of energy. Energy consumption of data centers worldwide is estimated at more than 1.5% of the global electricity use and is expected to grow further. A problem usually studied in distributed systems is to evenly distribute the load. But when the goal is to reduce energy consumption, this type of algorithms can lead to have machines largely under-loaded and therefore consuming energy unnecessarily. This thesis presents novel techniques, algorithms, and software for distributed dynamic consolidation of Virtual Machines (VMs) in Cloud. The main objective of this thesis is to provide energy-aware scheduling strategies in cloud computing for energy saving. To achieve this goal, we use centralized and decentralized approaches. Contributions in this method are presented these two axes. The objective of our approach is to reduce data center's total energy consumed by controlling cloud applications' overall energy consumption while ensuring cloud applications' service level agreement. Energy consumption is reduced by dynamically deactivating and reactivating physical nodes to meet the current resource demand. The key contributions are: - First, we present an energy aware clouds scheduling using anti-load balancing algorithm : concentrate the load on a minimum number of severs. The goal is to turn off the machines released and therefore minimize the energy consumption of the system. - The second axis proposed an algorithm which works by associating a credit value with each node. The credit of a node depends on its affinity to its jobs, its current workload and its communication behavior. Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs, and thermal state of computing nodes. The experiment results, obtained with a simulator which extends CloudSim (EnerSim), show that the cloud application energy consumption and energy efficiency are being improved. - The third axis is dedicated to the consideration of a decentralized dynamic scheduling approach entitled Cooperative scheduling Anti-load balancing Algorithm for cloud. It is a decentralized approach that allows cooperation between different sites. To validate this algorithm, we have extended the simulator MaGateSim. With an extensive experimental evaluation with a real workload dataset, we got the conclusion that both the approach using centralized and decentralized algorithms can reduce energy consumed by data centers
- …