135 research outputs found

    Extending Demand Response to Tenants in Cloud Data Centers via Non-intrusive Workload Flexibility Pricing

    Full text link
    Participating in demand response programs is a promising tool for reducing energy costs in data centers by modulating energy consumption. Towards this end, data centers can employ a rich set of resource management knobs, such as workload shifting and dynamic server provisioning. Nonetheless, these knobs may not be readily available in a cloud data center (CDC) that serves cloud tenants/users, because workloads in CDCs are managed by tenants themselves who are typically charged based on a usage-based or flat-rate pricing and often have no incentive to cooperate with the CDC operator for demand response and cost saving. Towards breaking such "split incentive" hurdle, a few recent studies have tried market-based mechanisms, such as dynamic pricing, inside CDCs. However, such mechanisms often rely on complex designs that are hard to implement and difficult to cope with by tenants. To address this limitation, we propose a novel incentive mechanism that is not dynamic, i.e., it keeps pricing for cloud resources unchanged for a long period. While it charges tenants based on a Usage-based Pricing (UP) as used by today's major cloud operators, it rewards tenants proportionally based on the time length that tenants set as deadlines for completing their workloads. This new mechanism is called Usage-based Pricing with Monetary Reward (UPMR). We demonstrate the effectiveness of UPMR both analytically and empirically. We show that UPMR can reduce the CDC operator's energy cost by 12.9% while increasing its profit by 4.9%, compared to the state-of-the-art approaches used by today's CDC operators to charge their tenants

    Evaluating demand response opportunities for data centers

    Full text link
    Data center demand response is a solution to a problem that is just recently emerging: Today's energy system is undergoing major transformations due to the increasing shares of intermittent renewable power sources as solar and wind. As the power grid physically requires balancing power feed-in and power draw at all times, traditionally, power generation plants with short ramp-up times were activated to avoid grid imbalances. Additionally, through demand response schemes power consumers can be incentivized to manipulate their planned power profile in order to activate hidden sources of flexibility. The data center industry has been identified as a suitable candidate for demand response as it is continuously growing and relies on highly automated processes. Technically, data centers can provide flexibility by, amongst others, temporally or geographically shifting their workload or shutting down servers. There is a large body of work that analyses the potential of data center demand response. Most of these, however, deal with very specific data center set-ups in very specific power flexibility markets, so that the external validity is limited. The presented thesis exceeds the related work creating a framework for modeling data center demand response on a high level of abstraction that allows subsuming a great variety of specific models in the area: Based on a generic architecture of demand response enabled data centers this is formalized through a micro-economics inspired optimization framework by generating technical power flex functions and an associated cost and market skeleton. As part of a two-step-evaluation an architectural framework for simulating demand response is created. Subsequently, a simulation instance of this high-level architecture is developed for a specific HPC data center in Germany implementing two power management strategies, namely temporally shifting workload and manipulating CPU frequency. The flexibility extracted is then monetized on the secondary reserve market and on the EPEX day ahead market in Germany. As a result, in 2014 this data center might have achieved the largest benefit gain by changing from static electricity pricing to dynamic EPEX prices without changing their power profile. Through demand response they might have created an additional gross benefit of 4 of the power bill on the secondary reserve market. In a sensitivity analysis, however, it could be shown that these results are largely dependent on specific parameters as service level agreements and job heterogeneity. The results show that even though concrete simulations help at understanding demand response with individual data centers, the modeling framework is needed to understand their relevance from a system-wide viewpoint

    Market driven elastic secure infrastructure

    Full text link
    In today’s Data Centers, a combination of factors leads to the static allocation of physical servers and switches into dedicated clusters such that it is difficult to add or remove hardware from these clusters for short periods of time. This silofication of the hardware leads to inefficient use of clusters. This dissertation proposes a novel architecture for improving the efficiency of clusters by enabling them to add or remove bare-metal servers for short periods of time. We demonstrate by implementing a working prototype of the architecture that such silos can be broken and it is possible to share servers between clusters that are managed by different tools, have different security requirements, and are operated by tenants of the Data Center, which may not trust each other. Physical servers and switches in a Data Center are grouped for a combination of reasons. They are used for different purposes (staging, production, research, etc); host applications required for servicing specific workloads (HPC, Cloud, Big Data, etc); and/or configured to meet stringent security and compliance requirements. Additionally, different provisioning systems and tools such as Openstack-Ironic, MaaS, Foreman, etc that are used to manage these clusters take control of the servers making it difficult to add or remove the hardware from their control. Moreover, these clusters are typically stood up with sufficient capacity to meet anticipated peak workload. This leads to inefficient usage of the clusters. They are under-utilized during off-peak hours and in the cases where the demand exceeds capacity the clusters suffer from degraded quality of service (QoS) or may violate service level objectives (SLOs). Although today’s clouds offer huge benefits in terms of on-demand elasticity, economies of scale, and a pay-as-you-go model yet many organizations are reluctant to move their workloads to the cloud. Organizations that (i) needs total control of their hardware (ii) has custom deployment practices (iii) needs to match stringent security and compliance requirements or (iv) do not want to pay high costs incurred from running workloads in the cloud prefers to own its hardware and host it in a data center. This includes a large section of the economy including financial companies, medical institutions, and government agencies that continue to host their own clusters outside of the public cloud. Considering that all the clusters may not undergo peak demand at the same time provides an opportunity to improve the efficiency of clusters by sharing resources between them. The dissertation describes the design and implementation of the Market Driven Elastic Secure Infrastructure (MESI) as an alternative to the public cloud and as an architecture for the lowest layer of the public cloud to improve its efficiency. It allows mutually non-trusting physically deployed services to share the physical servers of a data center efficiently. The approach proposed here is to build a system composed of a set of services each fulfilling a specific functionality. A tenant of the MESI has to trust only a minimal functionality of the tenant that offers the hardware resources. The rest of the services can be deployed by each tenant themselves MESI is based on the idea of enabling tenants to share hardware they own with tenants they may not trust and between clusters with different security requirements. The architecture provides control and freedom of choice to the tenants whether they wish to deploy and manage these services themselves or use them from a trusted third party. MESI services fit into three layers that build on each other to provide: 1) Elastic Infrastructure, 2) Elastic Secure Infrastructure, and 3) Market-driven Elastic Secure Infrastructure. 1) Hardware Isolation Layer (HIL) – the bottommost layer of MESI is designed for moving nodes between multiple tools and schedulers used for managing the clusters. It defines HIL to control the layer 2 switches and bare-metal servers such that tenants can elastically adjust the size of the clusters in response to the changing demand of the workload. It enables the movement of nodes between clusters with minimal to no modifications required to the tools and workflow used for managing these clusters. (2) Elastic Secure Infrastructure (ESI) builds on HIL to enable sharing of servers between clusters with different security requirements and mutually non-trusting tenants of the Data Center. ESI enables the borrowing tenant to minimize its trust in the node provider and take control of trade-offs between cost, performance, and security. This enables sharing of nodes between tenants that are not only part of the same organization by can be organization tenants in a co-located Data Center. (3) The Bare-metal Marketplace is an incentive-based system that uses economic principles of the marketplace to encourage the tenants to share their servers with others not just when they do not need them but also when others need them more. It provides tenants the ability to define their own cluster objectives and sharing constraints and the freedom to decide the number of nodes they wish to share with others. MESI is evaluated using prototype implementations at each layer of the architecture. (i) The HIL prototype implemented with only 3000 Lines of Code (LOC) is able to support many provisioning tools and schedulers with little to no modification; adds no overhead to the performance of the clusters and is in active production use at MOC managing over 150 servers and 11 switches. (ii) The ESI prototype builds on the HIL prototype and adds to it an attestation service, a provisioning service, and a deterministically built open-source firmware. Results demonstrate that it is possible to build a cluster that is secure, elastic, and fairly quick to set up. The tenant requires only minimum trust in the provider for the availability of the node. (iii) The MESI prototype demonstrates the feasibility of having a one-of-kind multi-provider marketplace for trading bare-metal servers where providers also use the nodes. The evaluation of the MESI prototype shows that all the clusters benefit from participating in the marketplace. It uses agents to trade bare-metal servers in a marketplace to meet the requirements of their clusters. Results show that compared to operating as silos individual clusters see a 50% improvement in the total work done; up to 75% improvement (reduction) in waiting for queues and up to 60% improvement in the aggregate utilization of the test bed. This dissertation makes the following contributions: (i) It defines the architecture of MESI allows mutually non-trusting tenants of the data center to share resources between clusters with different security requirements. (ii) Demonstrates that it is possible to design a service that breaks the silos of static allocation of clusters yet has a small Trusted Computing Base (TCB) and no overhead to the performance of the clusters. (iii) Provides a unique architecture that puts the tenant in control of its own security and minimizes the trust needed in the provider for sharing nodes. (iv) A working prototype of a multi-provider marketplace for bare-metal servers which is a first proof-of-concept that demonstrates that it is possible to trade real bare-metal nodes at practical time scales such that moving nodes between clusters is sufficiently fast to be able to get some useful work done. (v) Finally results show that it is possible to encourage even mutually non-trusting tenants to share their nodes with each other without any central authority making allocation decisions. Many smart, dedicated engineers and researchers have contributed to this work over the years. I have jointly led the efforts to design the HIL and the ESI layer; led the design and implementation of the bare-metal marketplace and the overall MESI architecture

    Machine Learning Algorithms for Provisioning Cloud/Edge Applications

    Get PDF
    Mención Internacional en el título de doctorReinforcement Learning (RL), in which an agent is trained to make the most favourable decisions in the long run, is an established technique in artificial intelligence. Its popularity has increased in the recent past, largely due to the development of deep neural networks spawning deep reinforcement learning algorithms such as Deep Q-Learning. The latter have been used to solve previously insurmountable problems, such as playing the famed game of “Go” that previous algorithms could not. Many such problems suffer the curse of dimensionality, in which the sheer number of possible states is so overwhelming that it is impractical to explore every possible option. While these recent techniques have been successful, they may not be strictly necessary or practical for some applications such as cloud provisioning. In these situations, the action space is not as vast and workload data required to train such systems is not as widely shared, as it is considered commercialy sensitive by the Application Service Provider (ASP). Given that provisioning decisions evolve over time in sympathy to incident workloads, they fit into the sequential decision process problem that legacy RL was designed to solve. However because of the high correlation of time series data, states are not independent of each other and the legacy Markov Decision Processes (MDPs) have to be cleverly adapted to create robust provisioning algorithms. As the first contribution of this thesis, we exploit the knowledge of both the application and configuration to create an adaptive provisioning system leveraging stationary Markov distributions. We then develop algorithms that, with neither application nor configuration knowledge, solve the underlying Markov Decision Process (MDP) to create provisioning systems. Our Q-Learning algorithms factor in the correlation between states and the consequent transitions between them to create provisioning systems that do not only adapt to workloads, but can also exploit similarities between them, thereby reducing the retraining overhead. Our algorithms also exhibit convergence in fewer learning steps given that we restructure the state and action spaces to avoid the curse of dimensionality without the need for the function approximation approach taken by deep Q-Learning systems. A crucial use-case of future networks will be the support of low-latency applications involving highly mobile users. With these in mind, the European Telecommunications Standards Institute (ETSI) has proposed the Multi-access Edge Computing (MEC) architecture, in which computing capabilities can be located close to the network edge, where the data is generated. Provisioning for such applications therefore entails migrating them to the most suitable location on the network edge as the users move. In this thesis, we also tackle this type of provisioning by considering vehicle platooning or Cooperative Adaptive Cruise Control (CACC) on the edge. We show that our Q-Learning algorithm can be adapted to minimize the number of migrations required to effectively run such an application on MEC hosts, which may also be subject to traffic from other competing applications.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Antonio Fernández Anta.- Secretario: Diego Perino.- Vocal: Ilenia Tinnirell

    New simulation techniques for energy aware cloud computing environments

    Get PDF
    In this thesis we propose a new simulation platform specifically designed for modelling cloud computing environments, its underlying architectures, and the energy consumed by hardware devices. The models that consists on servers are divided into the five basic subsystems: processing system, memory system, network system, storage system, and the power supply unit. Each one of these subsystems has been built including new strategies to simulate energy aware. On the top of these models, there have been deployed the virtualization models to simulate the hypervisor and its scheduling policies. In addition, the cloud manager, the core of the simulation platform, is responsible for the provisioning resources management policies. It design offers to researchers APIs, allowing to perform studies on scheduling policies of cloud computing systems. This simulation platform is aimed to model existent and new designs of cloud computing architectures, with a customizable environment to configure the energy consumption of different components. The main characteristics of this platform are flexibility, allowing a wide possibility of designs; scalability to study large environments; and to provide a good compromise between accuracy and performance. A validation process of the simulation platform has been reached by comparing results from real experiments, with results from simulation executions obtained by modelling the real experiments. Therefore, to evaluate the possibility to foresee the energy consumption of a real cloud environment, an experiment of deploying a model of a real application has been studied. Finally, scalability experiments has been performed to study the behaviour of the simulation platform with large scale environments experiments. The main aim of scalability tests, is to calculate both, the amount of time and memory needed to execute large simulations, depending on the size of the environment simulated, and the availability of hardware resources to execute them.En esta tesis se propone una nueva plataforma de simulación específicamente diseñada para modelar entornos de computación en la nube, sus arquitecturas subyacentes, y la energía consumida por los dispositivos hardware. Los modelos que constituyen los servidores se encuentran divididos en los cinco subsistemas básicos: sistema de procesamiento, sistema de memoria, sistema de almacenamiento, sistema de red, y fuente de alimentación. Cada uno de estos subsistemas ha sido modelado incluyendo nuevas estrategias para simular su consumo energético. Sobre estos modelos se despliegan los modelos de virtualización con la finalidad de simular el hipervisor y sus políticas de planificación. Además, se ha realizado el modelo del gestor de la nube, la pieza central de la plataforma de simulación y responsable de la gestión de las políticas de aprovisionamiento de recursos. Su diseño ofrece interfaces a los investigadores, permitiendo realizar sus estudios sobre políticas de planificación en entornos de computación en la nube. Los objetivos de esta plataforma de simulación son permitir el modelado de entornos existentes y nuevos diseños arquitectónicos de computación en la nube, con un entorno configurable que permita modificar valores de consumo energético de los distintos componentes. Las principales características de esta plataforma son su flexibilidad, permitiendo una amplia posibilidad de diseños; escalabilidad, para estudiar entornos con gran número de elementos; y proveer un buen compromiso entre la precisión de los resultados y su rendimiento. Se ha realizado el proceso de validación de la plataforma de simulación mediante la comparación de resultados de experimentos realizados en entornos reales, con los resultados de simulación obtenidos de modelar dichos entornos reales. Tras ello, se ha realizado una evaluación mostrando la capacidad de prever el consumo energético de un entorno de computación en la nube que modela una aplicación real. Finalmente, se han realizado experimentos para analizar la escalabilidad, con el fin de estudiar el comportamiento de la plataforma ante la simulación de entornos de gran escala. El principal objetivo de los test de escalabilidad consiste en calcular la cantidad de tiempo y de memoria necesarios para ejecutar simulaciones grandes, dependiendo del tamaño del entorno simulado, y de la disponibilidad de recursos físicos para ejecutarlas.This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, the COST Action IC1305,”Network on Sustainable Ultrascale Computing (NESUS)”, ESTuDIo (TIN2012-36812-C02-01), SICOMORo-CM (S2013/ICE-3006), the SEPE (Servicio Público de Empleo Estatal) commonly known as INEM, my entire savings, and part from my parents.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Félix García Carballeira.- Secretario: Jorge Enrique Pérez Martínez.- Vocal: Manuel Núñez Garcí

    EXCLAIM framework: a monitoring and analysis framework to support self-governance in Cloud Application Platforms

    Get PDF
    The Platform-as-a-Service segment of Cloud Computing has been steadily growing over the past several years, with more and more software developers opting for cloud platforms as convenient ecosystems for developing, deploying, testing and maintaining their software. Such cloud platforms also play an important role in delivering an easily-accessible Internet of Services. They provide rich support for software development, and, following the principles of Service-Oriented Computing, offer their subscribers a wide selection of pre-existing, reliable and reusable basic services, available through a common platform marketplace and ready to be seamlessly integrated into users' applications. Such cloud ecosystems are becoming increasingly dynamic and complex, and one of the major challenges faced by cloud providers is to develop appropriate scalable and extensible mechanisms for governance and control based on run-time monitoring and analysis of (extreme amounts of) raw heterogeneous data. In this thesis we address this important research question -- \textbf{how can we support self-governance in cloud platforms delivering the Internet of Services in the presence of large amounts of heterogeneous and rapidly changing data?} To address this research question and demonstrate our approach, we have created the Extensible Cloud Monitoring and Analysis (EXCLAIM) framework for service-based cloud platforms. The main idea underpinning our approach is to encode monitored heterogeneous data using Semantic Web languages, which then enables us to integrate these semantically enriched observation streams with static ontological knowledge and to apply intelligent reasoning. This has allowed us to create an extensible, modular, and declaratively defined architecture for performing run-time data monitoring and analysis with a view to detecting critical situations within cloud platforms. By addressing the main research question, our approach contributes to the domain of Cloud Computing, and in particular to the area of autonomic and self-managing capabilities of service-based cloud platforms. Our main contributions include the approach itself, which allows monitoring and analysing heterogeneous data in an extensible and scalable manner, the prototype of the EXCLAIM framework, and the Cloud Sensor Ontology. Our research also contributes to the state of the art in Software Engineering by demonstrating how existing techniques from several fields (i.e., Autonomic Computing, Service-Oriented Computing, Stream Processing, Semantic Sensor Web, and Big Data) can be combined in a novel way to create an extensible, scalable, modular, and declaratively defined monitoring and analysis solution

    Automating the support of highly-configurable services

    Get PDF
    Faltan las palabras claveLas crecientes capacidades de configuración de los servicios, especialmente en el cloud, han dado lugar a los así llamados Servicios Altamente Configurables (HCSs). Dichas capacidades de configuración están aumentando la demanda y complejidad de las aplicaciones basadas en HCS y de las necesidades de infraestructura para soportarlas, soluciones guiadas por HCS de aquí en adelante. Tras un estudio del estado del arte, concluimos que dichas soluciones guiadas por HCSs pueden ser mejoradas significativamente en 1) los lenguajes para describir las configuraciones, también conocidas como el espacio de decisión del servicio, y 2) en las técnicas para extraer información de utilidad del espacio de decisión, técnicas de análisis en adelante. Por un lado, no existen Lenguajes Específicos de Dominio (DSLs) para describir el espacio de decisión, aunque hay algunas aproximaciones cercanas. En esta tesis creemos que es posible mejorar el actual panorama diseñando un DSL: i) en conformidad con los principales proveedores de HCSs, ii) que soporte múltiples ítems, iii) suficientemente expresivo para facilitar la descripción de relaciones lógicas y aritméticas en los términos del servicio y iv) independiente del dominio. Adicionalmente, este DSL debe definir criterios de validez para asegurar que el espacio de decisión satisface propiedades básicas como la consistencia y la configurabilidad. Más allá, se deben ofrecer explicaciones cuando el espacio de decisión del servicio no satisfaga dichas propiedades. Por otro lado, la mayor parte de las actuales técnicas de análisis, como aquellas encargadas de encontrar configuraciones óptimas o de reconfigurar servicios multitenant, llevan consigo algunos inconvenientes propios de técnicas emergentes. Para superar estos inconvenientes, se deben desarrollar: a) implementaciones de referencia totalmente funcionales, b) técnicas reusables, c) mecanismos de extensión efectivos e d) interfaces amigables. El principal objetivo de esta disertación es mejorar el soporte existente para el desarrollo de soluciones guiadas por HCS, considerando las recomendaciones anteriores. Las principales contribuciones de esta tesis son un DSL (llamado SYNOPSIS) para especificar el espacio de decisión de los HCS, y un nuevo catálogo de operaciones para comprobar y explicar los criterios de validez así como para encontrar configuraciones óptimas para uno o más usuarios. Como contribuciones menores, también presentamos dos soluciones que han sido desarrolladas para mejorar el soporte existente para la migración de infraestructuras a Amazon EC2 y para reconfigurar servicios multitenant. La piedra angular de nuestra propuesta para mejorar las técnicas de especificación ha sido definir un DSL, SYNOPSIS, y dotarlo con semántica formal basada en Modelos de Características con Estados. Acerca de nuestra propuesta para mejorar las técnicas de análisis, la clave ha sido la organización de dichas técnicas en un catálogo de operaciones básicas que pueden ser combinadas para dar lugar a soluciones guiadas por HCS m más avanzadas. La aplicabilidad de nuestros resultados está limitada a aquellos espacios de decisión que pueden ser transformados en Modelos de Características con Estados, que por nuestra experiencia es suficiente para soportar HCSs reales.The growing customisation capabilities of services, especially in Cloud scenarios, have led to the so-named Highly-configurable Services (HCSs). Such capabilities are boosting the demand and complexity of HCS-based applications and the infrastructure need to support them, HCS-driven solutions from now on. After a study of the existing literature we conclude that these HCS-driven solutions can be significantly enhanced in both 1) the languages to describe the configurations, a.k.a. the decision space of HCSs and 2) the techniques to extract useful information from the decision space, analysis techniques, in advance. On the one hand, there are no Domain Specific Languages (DSLs) to describe the decision space, although there exist some very close approaches. We suggest that it is possible to improve the current landscape by designing a DSL: i) Compliant with big HCS vendors, ii) multi-item aware, iii) expressive-enough to ease the description of arithmetic-logical relationships on configurable description terms, and iv) domainindependent. In addition, this DSL must define validity criteria for checking that the decision space satisfies some basic properties such as the consistency, and the configurability. Furthermore, explanations must be provided when the decision space do not satisfy such basic properties. On the other hand, most of the current analysis techniques such as those relate to find optimal configurations or reconfigure a multi-tenant service includes some drawbacks that can be found in emerging techniques. To overcome such drawbacks there must be developed: a) fully-functional reference implementations, b) techniques with a reuse-oriented design, c) effective extension mechanisms, and d) user-friendly interfaces. The main goal of this dissertation is to enhance the existing support to develop HCS-driven solutions considering the aforementioned suggestions for improvement. The main thesis contributions are a DSL to specify the decision space of HCSs called SYNOPSIS, and a novel catalogue of analysis operations to check and explain validity criteria as well as to find optimal configurations for one or many users. As minor contributions, two solutions have been developed to improve the existing tooling support to migrate on-premise infrastructure to Amazon EC2 and to reconfigure multi-tenant service. The cornerstone of our proposal to improve the specification techniques has been to define a DSL, SYNOPSIS, and endow it with a formal semantics based on Stateful Feature Models. Regarding our proposal to improve the analysis techniques, the key has been the organization of such techniques in a catalogue of basic analysis operations that can be combined to support more advanced HCS-driven solutions. The applicability of our results is limited to those decision spaces that can be translated to a Stateful Feature Model, that is enough to support real-world HCSs, in our experience
    corecore