10 research outputs found
Oversubscribing micro-clouds with energy-aware containers scheduling
Cloud computation is being pushed to the edge of the network, towards Micro-clouds, to promote more energy efficiency and less latency when compared to heavy resourced centralized datacenters. This trend will enable new markets and providers to fill the current gap. There are however challenges in this design: (i) devices have less resources, leading to a frequent use of oversubscription (ii) lack of economic incentives to both provider and application owner to cope with less than full requests fulfilled. To support this trend, the virtualization layer of Micro-clouds is currently dominated by containers, which have a small memory footprint and strong isolation properties. We propose an extension to Docker Swarm, a widely used containers orchestrator, with an oversubscribing scheduling algorithm, based on improving resources utilization to levels where the energy efficiency is maximized. This solution improves CPU and memory utilization over Spread and Binpack (Docker Swarm strategies). Althoughwe introduce a small overhead in scheduling times, our solution manages to allocate more requests, with a successful allocation rate of 83% against 57% of current solutions, measured on the scheduling of real CPU- and memoryintensive workloads (e.g. Video encoding, Key-value storages and a Deep-learning algorithm).info:eu-repo/semantics/publishedVersio
Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters
It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this article we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however, avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34 and 43.49 percent, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4 percent against the case of executing the LRAs alone
Energy and performance-aware scheduling and shut-down models for efficient cloud-computing data centers.
This Doctoral Dissertation, presented as a set of research contributions, focuses on resource efficiency in data centers. This topic has been faced mainly by the development of several energy-efficiency, resource managing and scheduling policies, as well as the simulation tools required to test them in realistic cloud computing environments. Several models have been implemented in order to minimize energy consumption in Cloud Computing environments. Among them: a) Fifteen probabilistic and deterministic energy-policies which shut-down idle machines; b) Five energy-aware
scheduling algorithms, including several genetic algorithm models; c) A Stackelberg game-based strategy which models the concurrency between opposite requirements of Cloud-Computing systems in order to dynamically apply the most optimal scheduling algorithms and energy-efficiency policies depending on the environment; and d) A productive analysis on the resource efficiency of several realistic cloud–computing environments. A novel simulation tool called SCORE, able to simulate several data-center sizes,
machine heterogeneity, security levels, workload composition and patterns, scheduling strategies and energy-efficiency strategies, was developed in order to test these strategies in large-scale cloud-computing clusters. As results, more than fifty Key Performance Indicators (KPI) show that more than 20% of energy consumption can be reduced in realistic high-utilization environments when proper policies are employed.Esta Tesis Doctoral, que se presenta como compendio de artÃculos de investigación, se centra en la eficiencia en la utilización de los recursos en centros de datos de internet. Este problema ha sido abordado esencialmente desarrollando diferentes estrategias de eficiencia energética, gestión y distribución de recursos, asà como todas las herramientas de simulación y análisis necesarias para su validación en entornos realistas de Cloud Computing.
Numerosas estrategias han sido desarrolladas para minimizar el consumo energético en entornos de Cloud Computing. Entre ellos:
1. Quince polÃticas de eficiencia energética, tanto probabilÃsticas como deterministas, que apagan máquinas en estado de espera siempre que sea posible;
2. Cinco algoritmos de distribución de tareas que tienen en cuenta el consumo energético, incluyendo varios modelos de algoritmos genéticos;
3. Una estrategia basada en la teorÃa de juegos de Stackelberg que modela la competición entre diferentes partes de los centros de datos que tienen objetivos encontrados. Este modelo aplica dinámicamente las estrategias de distribución de tareas y las polÃticas de eficiencia energética dependiendo de las caracterÃsticas del entorno; y
4. Un análisis productivo sobre la eficiencia en la utilización de recursos en numerosos escenarios de Cloud Computing.
Una nueva herramienta de simulación llamada SCORE se ha desarrollado para analizar las estrategias antes mencionadas en clústers de Cloud Computing de grandes dimensiones. Los resultados obtenidos muestran que se puede conseguir un ahorro de energÃa superior al 20% en entornos realistas de alta utilización si se emplean las estrategias de eficiencia energética adecuadas. SCORE es open source y puede simular diferentes centros de datos con, entre otros muchos, los siguientes parámetros: Tamaño del centro de datos; heterogeneidad de los servidores; tipo, composición y patrones de carga de trabajo, estrategias de distribución de tareas y polÃticas de eficiencia energética, asà como tres gestores de recursos centralizados: MonolÃtico, Two-level y Shared-state. Como resultados, esta herramienta de simulación arroja más de 50 Key Performance Indicators (KPI) de rendimiento general, de distribucin de tareas y de energÃa.Premio Extraordinario de Doctorado U
Thermal Energy Storage for Datacenters with Phase Change Materials
Datacenters, vast warehouses containing millions of servers that run the internet and the cloud, have experienced double digit growth for almost two decades. Datacenters cost hundreds of millions of dollars, with the largest now exceeding over a billion dollars each, and consume enormous amounts of power–over 2% of all electricity in the US and projected to increase up to 10% by 2030.
The impact of such high compute density, with thousands of individual compute nodes packed together in a small space, is heat: every watt of power used by servers must be removed form the datacenter. This requires active cooling: air cooling is by far the most common with an air conditioner or other form of heat exchanger cooling air in the datacenter room then transporting heat outside the facility to heat exchanger or similar fixture. Such a system is simple, common, and functional, but inherently inefficient due to the nature of datacenter workloads.
Datacenters primarily server user facing workloads, that is: the user requests a search or sends and email and their query prompts load in the datacenter. The query is handled locally, on a relative geographic scale, to provide a low response time and positive user experience. This necessitates globally distributed datacenter capacity, but also creates a diurnal load pattern whereby datacenters are most heavily loaded during the peak hours when users in their region of service are awake and active online versus the off hours when users are offline or asleep and query requests are low. Because datacenter infrastructure must be provisioned for peak load, servers, power distribution, and cooling infrastructure is significantly underutilized most of the time.
This dissertation investigates the cooling needs of datacenters, and proposes to decouple the work and cooling needs. Specifically, we hypothesize that by storing thermal energy we can reshape the thermal profile of a datacenter to better balance cooling load throughout the day. We call this technique Thermal Time Shifting (TTS). First, we discuss how phase change materials (PCMs) enable TTS and evaluate the potential use scenarios of placing a small amount of PCM inside of servers for thermal energy storage. Next we dive deeper into the potential of thermal energy storage and propose Virtual Melting Temperatures (VMT), a technique that uses active job placement to control the melting and cooling of PCM to enable a much greater degree of control over the behavior of the thermal profile. Finally we propose and evaluate Thermal Gradient Transfer (TGT), a technique that uses direct water cooling to move heat straight from CPUs and GPUs to the wax for wider applicability and greater peak cooling load reduction.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/147726/1/skachm_1.pdfDescription of skachm_1.pdf : Restricted to UM users only
Towards a novel biologically-inspired cloud elasticity framework
With the widespread use of the Internet, the popularity of web applications has
significantly increased. Such applications are subject to unpredictable workload
conditions that vary from time to time. For example, an e-commerce website may
face higher workloads than normal during festivals or promotional schemes. Such
applications are critical and performance related issues, or service disruption can
result in financial losses. Cloud computing with its attractive feature of dynamic
resource provisioning (elasticity) is a perfect match to host such applications.
The rapid growth in the usage of cloud computing model, as well as the rise in
complexity of the web applications poses new challenges regarding the effective
monitoring and management of the underlying cloud computational resources.
This thesis investigates the state-of-the-art elastic methods including the models
and techniques for the dynamic management and provisioning of cloud resources
from a service provider perspective.
An elastic controller is responsible to determine the optimal number of cloud resources,
required at a particular time to achieve the desired performance demands.
Researchers and practitioners have proposed many elastic controllers using versatile
techniques ranging from simple if-then-else based rules to sophisticated
optimisation, control theory and machine learning based methods. However,
despite an extensive range of existing elasticity research, the aim of implementing
an efficient scaling technique that satisfies the actual demands is still a challenge
to achieve. There exist many issues that have not received much attention from
a holistic point of view. Some of these issues include: 1) the lack of adaptability
and static scaling behaviour whilst considering completely fixed approaches; 2)
the burden of additional computational overhead, the inability to cope with the
sudden changes in the workload behaviour and the preference of adaptability
over reliability at runtime whilst considering the fully dynamic approaches; and 3)
the lack of considering uncertainty aspects while designing auto-scaling solutions.
This thesis seeks solutions to address these issues altogether using an integrated
approach. Moreover, this thesis aims at the provision of qualitative elasticity rules.
This thesis proposes a novel biologically-inspired switched feedback control
methodology to address the horizontal elasticity problem. The switched methodology
utilises multiple controllers simultaneously, whereas the selection of a
suitable controller is realised using an intelligent switching mechanism. Each
controller itself depicts a different elasticity policy that can be designed using the
principles of fixed gain feedback controller approach. The switching mechanism
is implemented using a fuzzy system that determines a suitable controller/-
policy at runtime based on the current behaviour of the system. Furthermore,
to improve the possibility of bumpless transitions and to avoid the oscillatory
behaviour, which is a problem commonly associated with switching based control
methodologies, this thesis proposes an alternative soft switching approach. This
soft switching approach incorporates a biologically-inspired Basal Ganglia based
computational model of action selection.
In addition, this thesis formulates the problem of designing the membership functions
of the switching mechanism as a multi-objective optimisation problem. The
key purpose behind this formulation is to obtain the near optimal (or to fine tune)
parameter settings for the membership functions of the fuzzy control system in
the absence of domain experts’ knowledge. This problem is addressed by using
two different techniques including the commonly used Genetic Algorithm and
an alternative less known economic approach called the Taguchi method. Lastly,
we identify seven different kinds of real workload patterns, each of which reflects
a different set of applications. Six real and one synthetic HTTP traces, one for
each pattern, are further identified and utilised to evaluate the performance of
the proposed methods against the state-of-the-art approaches
Towards a novel biologically-inspired cloud elasticity framework
With the widespread use of the Internet, the popularity of web applications has
significantly increased. Such applications are subject to unpredictable workload
conditions that vary from time to time. For example, an e-commerce website may
face higher workloads than normal during festivals or promotional schemes. Such
applications are critical and performance related issues, or service disruption can
result in financial losses. Cloud computing with its attractive feature of dynamic
resource provisioning (elasticity) is a perfect match to host such applications.
The rapid growth in the usage of cloud computing model, as well as the rise in
complexity of the web applications poses new challenges regarding the effective
monitoring and management of the underlying cloud computational resources.
This thesis investigates the state-of-the-art elastic methods including the models
and techniques for the dynamic management and provisioning of cloud resources
from a service provider perspective.
An elastic controller is responsible to determine the optimal number of cloud resources,
required at a particular time to achieve the desired performance demands.
Researchers and practitioners have proposed many elastic controllers using versatile
techniques ranging from simple if-then-else based rules to sophisticated
optimisation, control theory and machine learning based methods. However,
despite an extensive range of existing elasticity research, the aim of implementing
an efficient scaling technique that satisfies the actual demands is still a challenge
to achieve. There exist many issues that have not received much attention from
a holistic point of view. Some of these issues include: 1) the lack of adaptability
and static scaling behaviour whilst considering completely fixed approaches; 2)
the burden of additional computational overhead, the inability to cope with the
sudden changes in the workload behaviour and the preference of adaptability
over reliability at runtime whilst considering the fully dynamic approaches; and 3)
the lack of considering uncertainty aspects while designing auto-scaling solutions.
This thesis seeks solutions to address these issues altogether using an integrated
approach. Moreover, this thesis aims at the provision of qualitative elasticity rules.
This thesis proposes a novel biologically-inspired switched feedback control
methodology to address the horizontal elasticity problem. The switched methodology
utilises multiple controllers simultaneously, whereas the selection of a
suitable controller is realised using an intelligent switching mechanism. Each
controller itself depicts a different elasticity policy that can be designed using the
principles of fixed gain feedback controller approach. The switching mechanism
is implemented using a fuzzy system that determines a suitable controller/-
policy at runtime based on the current behaviour of the system. Furthermore,
to improve the possibility of bumpless transitions and to avoid the oscillatory
behaviour, which is a problem commonly associated with switching based control
methodologies, this thesis proposes an alternative soft switching approach. This
soft switching approach incorporates a biologically-inspired Basal Ganglia based
computational model of action selection.
In addition, this thesis formulates the problem of designing the membership functions
of the switching mechanism as a multi-objective optimisation problem. The
key purpose behind this formulation is to obtain the near optimal (or to fine tune)
parameter settings for the membership functions of the fuzzy control system in
the absence of domain experts’ knowledge. This problem is addressed by using
two different techniques including the commonly used Genetic Algorithm and
an alternative less known economic approach called the Taguchi method. Lastly,
we identify seven different kinds of real workload patterns, each of which reflects
a different set of applications. Six real and one synthetic HTTP traces, one for
each pattern, are further identified and utilised to evaluate the performance of
the proposed methods against the state-of-the-art approaches
Methodology for malleable applications on distributed memory systems
A la portada logo BSC(English) The dominant programming approach for scientific and industrial computing on clusters is MPI+X. While there are a variety of approaches within the node, denoted by the ``X'', Message Passing interface (MPI) is the standard for programming multiple nodes with distributed memory. This thesis argues that the OmpSs-2 tasking model can be extended beyond the node to naturally support distributed memory, with three benefits:
First, at small to medium scale the tasking model is a simpler and more productive alternative to MPI. It eliminates the need to distribute the data explicitly and convert all dependencies into explicit message passing. It also avoids the complexity of hybrid programming using MPI+X.
Second, the ability to offload parts of the computation among the nodes enables the runtime to automatically balance the loads in a full-scale MPI+X program. This approach does not require a cost model, and it is able to transparently balance the computational loads across the whole program, on all its nodes.
Third, because the runtime handles all low-level aspects of data distribution and communication, it can change the resource allocation dynamically, in a way that is transparent to the application.
This thesis describes the design, development and evaluation of OmpSs-2@Cluster, a programming model and runtime system that extends the OmpSs-2 model to allow a virtually unmodified OmpSs-2 program to run across multiple distributed memory nodes. For well-balanced applications it provides similar performance to MPI+OpenMP on up to 16 nodes, and it improves performance by up to 2x for irregular and unbalanced applications like Cholesky factorization.
This work also extended OmpSs-2@Cluster for interoperability with MPI and Barcelona Supercomputing Center (BSC)'s state-of-the-art Dynamic Load Balance (DLB) library in order to dynamically balance MPI+OmpSs-2 applications by transparently offloading tasks among nodes. This approach reduces the execution time of a microscale solid mechanics application by 46% on 64 nodes and on a synthetic benchmark, it is within 10% of perfect load balancing on up to 8 nodes.
Finally, the runtime was extended to transparently support malleability for pure OmpSs-2@Cluster programs and interoperate with the Resources Management System (RMS). The only change to the application is to explicitly call an API function to control the addition or removal of nodes. In this regard we additionally provide the runtime with the ability to semi-transparently save and recover part of the application status to perform checkpoint and restart. Such a feature hides the complexity of
data redistribution and parallel IO from the user while allowing the program to recover and continue previous executions. Our work is a starting point for future research on fault tolerance.
In summary, OmpSs-2@Cluster expands the OmpSs-2 programming model to encompass distributed memory clusters. It allows an existing OmpSs-2 program, with few if any changes, to run across multiple nodes. OmpSs-2@Cluster supports transparent multi-node dynamic load balancing for MPI+OmpSs-2 programs, and enables semi-transparent malleability for OmpSs-2@Cluster programs. The runtime system has a high level of stability and performance, and it opens several avenues for future work.(Español) El modelo de programación dominante para clusters tanto en ciencia como industria es actualmente MPI+X. A pesar de que hay alguna variedad de alternativas para programar dentro de un nodo (indicado por la "X"), el estandar para programar múltiples nodos con memoria distribuida sigue siendo Message Passing Interface (MPI). Esta tesis propone la extensión del modelo de programación basado en tareas OmpSs-2 para su funcionamiento en sistemas de memoria distribuida, destacando 3 beneficios principales: En primer lugar; a pequeña y mediana escala, un modelo basado en tareas es más simple y productivo que MPI y elimina la necesidad de distribuir los datos explÃcitamente y convertir todas las dependencias en mensajes. Además, evita la complejidad de la programacion hÃbrida MPI+X. En segundo lugar; la capacidad de enviar partes del cálculo entre los nodos permite a la librerÃa balancear la carga de trabajo en programas MPI+X a gran escala. Este enfoque no necesita un modelo de coste y permite equilibrar cargas transversalmente en todo el programa y todos los nodos. En tercer lugar; teniendo en cuenta que es la librerÃa quien maneja todos los aspectos relacionados con distribución y transferencia de datos, es posible la modificación dinámica y transparente de los recursos que utiliza la aplicación. Esta tesis describe el diseño, desarrollo y evaluación de OmpSs-2@Cluster; un modelo de programación y librerÃa que extiende OmpSs-2 permitiendo la ejecución de programas OmpSs-2 existentes en múltiples nodos sin prácticamente necesidad de modificarlos. Para aplicaciones balanceadas, este modelo proporciona un rendimiento similar a MPI+OpenMP hasta 16 nodos y duplica el rendimiento en aplicaciones irregulares o desbalanceadas como la factorización de Cholesky. Este trabajo incluye la extensión de OmpSs-2@Cluster para interactuar con MPI y la librerÃa de balanceo de carga Dynamic Load Balancing (DLB) desarrollada en el Barcelona Supercomputing Center (BSC). De este modo es posible equilibrar aplicaciones MPI+OmpSs-2 mediante la transferencia transparente de tareas entre nodos. Este enfoque reduce el tiempo de ejecución de una aplicación de mecánica de sólidos a micro-escala en un 46% en 64 nodos; en algunos experimentos hasta 8 nodos se pudo equilibrar perfectamente la carga con una diferencia inferior al 10% del equilibrio perfecto. Finalmente, se implementó otra extensión de la librerÃa para realizar operaciones de maleabilidad en programas OmpSs-2@Cluster e interactuar con el Sistema de Manejo de Recursos (RMS). El único cambio requerido en la aplicación es la llamada explicita a una función de la interfaz que controla la adición o eliminación de nodos. Además, se agregó la funcionalidad de guardar y recuperar parte del estado de la aplicación de forma semitransparente con el objetivo de realizar operaciones de salva-reinicio. Dicha funcionalidad oculta al usuario la complejidad de la redistribución de datos y las operaciones de lectura-escritura en paralelo, mientras permite al programa recuperar y continuar ejecuciones previas. Este es un punto de partida para futuras investigaciones en tolerancia a fallos. En resumen, OmpSs-2@Cluster amplÃa el modelo de programación de OmpSs-2 para abarcar sistemas de memoria distribuida. El modelo permite la ejecución de programas OmpSs-2 en múltiples nodos prácticamente sin necesidad de modificarlos. OmpSs-2@Cluster permite además el balanceo dinámico de carga en aplicaciones hÃbridas MPI+OmpSs-2 ejecutadas en varios nodos y es capaz de realizar maleabilidad semi-transparente en programas OmpSs-2@Cluster puros. La librerÃa tiene un niveles de rendimiento y estabilidad altos y abre varios caminos para trabajos futuro.Arquitectura de computador
Recommended from our members
The Design, Implementation, and Evaluation of Software and Architectural Support for ARM Virtualization
The ARM architecture is dominating in the mobile and embedded markets and is making an upwards push into the server and networking markets where virtualization is a key technology. Similar to x86, ARM has added hardware support for virtualization, but there are important differences between the ARM and x86 architectural designs. Given two widely deployed computer architectures with different approaches to hardware virtualization support, we can evaluate, in practice, benefits and drawbacks of different approaches to architectural support for virtualization.
This dissertation explores new approaches to combining software and architectural support for virtualization with a focus on the ARM architecture and shows that it is possible to provide virtualization services an order of magnitude more efficiently than traditional implementations.
First, we investigate why the ARM architecture does not meet the classical requirements for virtualizable architectures and present an early prototype of KVM for ARM, a hypervisor using lightweight paravirtualization to run VMs on ARM systems without hardware virtualization support. Lightweight paravirtualization is a fully automated approach which replaces sensitive instructions with privileged instructions and requires no understanding of the guest OS code.
Second, we introduce split-mode virtualization to support hosted hypervisor designs using ARM's architectural support for virtualization. Different from x86, the ARM virtualization extensions are based on a new hypervisor CPU mode, separate from existing CPU modes. This separate hypervisor CPU mode does not support running existing unmodified OSes, and therefore hosted hypervisor designs, in which the hypervisor runs as part of a host OS, do not work on ARM. Split-mode virtualization splits the execution of the hypervisor such that the host OS with core hypervisor functionality runs in the existing kernel CPU mode, but a small runtime runs in the hypervisor CPU mode and supports switching between the VM and the host OS. Split-mode virtualization was used in KVM/ARM, which was designed from the ground up as an open source project and merged in the mainline Linux kernel, resulting in interesting lessons about translating research ideas into practice.
Third, we present an in-depth performance study of 64-bit ARMv8 virtualization using server hardware and compare against x86. We measure the performance of both standalone and hosted hypervisors on both ARM and x86 and compare their results. We find that ARM hardware support for virtualization can enable faster transitions between the VM and the hypervisor for standalone hypervisors compared to x86, but results in high switching overheads for hosted hypervisors compared to both x86 and to standalone hypervisors on ARM. We identify a key reason for high switching overhead for hosted hypervisors being the need to save and restore kernel mode state between the host OS kernel and the VM kernel. However, standalone hypervisors such as Xen, cannot leverage their performance benefit in practice for real application workloads. Other factors related to hypervisor software design and I/O emulation play a larger role in overall hypervisor performance than low-level interactions between the hypervisor and the hardware.
Fourth, realizing that modern hypervisors rely on running a full OS kernel, the hypervisor OS kernel, to support their hypervisor functionality, we present a new hypervisor design which runs the hypervisor and its hypervisor OS kernel in ARM's separate hypervisor CPU mode and avoids the need to multiplex kernel mode CPU state between the VM and the hypervisor. Our design benefits from new architectural features, the virtualization host extensions (VHE), in ARMv8.1 to avoid modifying the hypervisor OS kernel to run in the hypervisor CPU mode. We show that the hypervisor must be co-designed with the hardware features to take advantage of running in a separate CPU mode and implement our changes to KVM/ARM. We show that running the hypervisor OS kernel in a separate CPU mode from the VM and taking advantage of ARM's ability to quickly switch between the VM and hypervisor results in an order of magnitude reduction in overhead for important virtualization microbenchmarks and reduces the overhead of real application workloads by more than 50%