365 research outputs found

    Energy-aware simulation of workflow execution in High Throughput Computing systems

    Get PDF
    Workflows offer a great potential for enacting corelated jobs in an automated manner. This is especially desirable when workflows are large or there is a desire to run a workflow multiple times. Much research has been conducted in reducing the makespan of running workflows and maximising the utilisation of the resources they run on, with some existing research investigates how to reduce the energy consumption of workflows on dedicated resources. We extend the HTC-Sim simulation framework to support workflows allowing us to evaluate different scheduling strategies on the overheads and energy consumption of workflows run on non-dedicated systems. We evaluate a number of scheduling strategies from the literature in an environment where (workflow) jobs can be evicted by higher priority users

    Energy-efficient checkpointing in high-throughput cycle-stealing distributed systems

    Get PDF
    Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware or software failures as well as interruptions from resource owners and more important tasks. Until recently many researchers have focused on the performance gains achieved through checkpointing, but now with growing scrutiny of the energy consumption of IT infrastructures it is increasingly important to understand the energy impact of checkpointing within an HTC environment. In this paper we demonstrate through trace-driven simulation of real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst maintaing the performance gains expected with checkpointing. Furthermore, we identify factors important in deciding whether to exploit checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches whist maintaining the performance benefits

    Trace-Driven Simulation for Energy Consumption in High Throughput Computing Systems

    Get PDF
    High Throughput Computing (HTC) is a powerful paradigm allowing vast quantities of independent work to be performed simultaneously. However, until recently little evaluation has been performed on the energy impact of HTC. Many organisations now seek to minimise energy consumption across their IT infrastructure though it is unclear how this will affect the usability of HTC systems. We present here HTC-Sim, a simulation system which allows the evaluation of different energy reduction policies across an HTC system comprising a collection of computational resources dedicated to HTC work and resources provided through cycle scavenging -- a Desktop Grid. We demonstrate that our simulation software scales linearly with increasing HTC workload

    Performance evaluation of virtual machine live migration for energy efficient large-scale computing

    Get PDF
    Ph. D. Thesis. (Integrated)Large-scale computing systems must overcome a number of di culties before they can be considered a long-term solution to information technology (IT) demands, including issues with power use and its green impact. Increasing the energy e ciency of largescale computing systems has long posed a challenge to researchers. Innovations in e cient energy use are needed that can lower energy costs and reduce the CO2 emissions associated with information and communications technology (ICT) equipment. For the purpose of facilitating energy e ciency in large-scale computing systems, virtual machine (VM) consolidation is among the key strategic approaches that can be employed. Virtual machine (VM) live migration has become an established technology used to consolidate virtualised workload onto a smaller number of physical machines, as a mechanism to reduce overall energy consumption. Nevertheless, it is important to acknowledge that the costs associated with VM live migration are not taken into account in the context of certain VM consolidation techniques. Organisations often exploit idle time on existing local computing infrastructure through High Throughput Computing (HTC) to perform the computation. More recently the same approach has been employed to make use of cloud resources in large-scale computation. To date, the impact of HTC scheduling policies within such environments has received limited attention in the literature as well as the trade-o between energy consumption and performance. Also, the bene ts of using virtualisation and live migration are not commonly applied in High Throughput Computing (HTC) environments. In this thesis, we illustrate through trace-driven simulation the trade-o between energy consumption and system performance for a number of HTC scheduling policies. Furthermore, the thesis demonstrates the way in which various workloads can a ect the time of VM live migration. We use a real experiment to explore the relation between various workload characteristics and the time of VM live migration. In order to understand what factors in uence live migration, we investigate three machine learning models to predict successful live migration using di erent training and evaluation - vii - sets drawn from our experimental data. Through this thesis, we explore how virtualisation and live migration can be employed in HTC environment and used as a fault-tolerance mechanism to reduce energy consumption and increase the utilisation of a single computer in a large computing infrastructure. We propose various migration policies and evaluate them through the use of our extensions to HTC-Sim simulation framework. Moreover, we compare the results between the policies as well as the system where migration is not considered. We demonstrate that our responsive migration could save approximately 75% of the system wasted energy due to job evictions by user interruptions where migration is not employed as a fault-tolerance mechanism

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption

    Dynamic scheduling based on particle swarm optimization for cloud-based scientific experiments

    Get PDF
    Los Experimentos de Barrido de Parámetros (PSEs) permiten a los científicos llevar a cabo simulaciones mediante la ejecución de un mismo código con diferentes entradas de datos, lo cual genera una gran cantidad de trabajos intensivos en CPU que para ser ejecutados es necesario utilizar entornos de cómputo paralelos. Un ejemplo de este tipo de entornos son las Infraestructura como un Servicio (IaaS) de Cloud, las cuales ofrecen máquinas virtuales (VM) personalizables que son asignadas a máquinas físicas disponibles para luego ejecutar los trabajos. Además, es importante planificar correctamente la asignación de las máquinas físicas del Cloud, y por lo tanto es necesario implementar estrategias eficientes de planificación para asignar adecuadamente las VMs en las máquinas físicas. Una planificación eficiente constituye un desafío, debido a que es un problema NP-Completo. En este trabajo describimos y evaluamos un planificador Cloud basado en Optimización por Enjambre de Partículas (PSO). Las métricas principales de rendimiento a estudiar son el número de usuarios que el planificador es capáz de servir exitosamente y el número total de VMs creadas en un escenario online (no por lotes). Además, en este trabajo se evalúa el número de mensajes enviados a través de la red. Los experimentos son realizados mediante el uso del simulador CloudSim y datos de trabajos de problemas científicos reales. Los resultados muestran que nuestro planificador logra el mejor rendimiento respecto de las métricas estudiadas con respecto a una asignación random y algoritmos genéticos. En este trabajo también evaluamos el rendimiento, a través de las métricas propuestas, cuando se provee al planificador información cualitativa relacionada a la longitud de los trabajos o no se provee la misma.Parameter Sweep Experiments (PSEs) allow scientists to perform simulations by running the same code with different input data, which results in many CPU-intensive jobs, and hence parallel computing environments must be used. Within these, Infrastructure as a Service (IaaS) Clouds offer custom Virtual Machines (VM) that are launched in appropriate hosts available in a Cloud to handle such jobs. Then, correctly scheduling Cloud hosts is very important and thus efficient scheduling strategies to appropriately allocate VMs to physical resources must be developed. Scheduling is however challenging due to its inherent NP-completeness. We describe and evaluate a Cloud scheduler based on Particle Swarm Optimization (PSO). The main performance metrics to study are the number of Cloud users that the scheduler is able to successfully serve, and the total number of created VMs, in online (non-batch) scheduling scenarios. Besides, the number of intra-Cloud network messages sent are evaluated. Simulated experiments performedusing CloudSim and job data from real scientific problems show that our scheduler achieves better performance than schedulers based on Random assignment and Genetic Algorithms. We also study the performance when supplying or not job information to the schedulers, namely a qualitative indication of job length.Fil: Pacini Naumovich, Elina Rocío. Universidad Nacional de Cuyo. Instituto de Tecnologías de la Información y las Comunicaciones; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Tandil. Instituto Superior de Ingenieria del Software; ArgentinaFil: Garcia Garino, Carlos Gabriel. Universidad Nacional de Cuyo. Instituto de Tecnologías de la Información y las Comunicaciones; Argentin
    corecore