Search CORE

22 research outputs found

Changement de contexte pour tâches virtualisées à l'échelle des grappes

Author: Hermenier Fabien
Lebre Adrien
Menaud Jean-Marc
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

National audienceDe nos jours, la gestion des ressources d'une grappe est effectuée en allouant des tranches de temps aux applications, spécifiées par les utilisateurs et de manière statique. Pour un utilisateur, soit les ressources demandées sont sur-estimées et la grappe est sous-utilisée, soit sous-dimensionnées et ses calculs sont dans la plupart des cas perdus. L'apparition de la virtualisation a apporté une certaine flexibilité quant à la gestion des applications et des ressources des grappes. Cependant, pour optimiser l'utilisation de ces ressource, et libérer les utilisateurs d'estimations hasardeuses, il devient nécessaire d'allouer dynamiquement les ressources en fonction des besoins réels des applications : Être capable de démarrer dynamiquement une application lorsqu'une ressource se libère ou la suspendre lorsque la ressource doit être ré-attribuée. En d'autres termes, être capable de développer un système comparable au changement de contexte sur les ordinateurs standards pour les applications s'exécutant sur une grappe. En s'appuyant sur la virtualisation, développer un tel mécanisme de manière générique devient envisageable. Dans cet article nous proposons une infrastructure offrant la notion de changement de contexte d'applications virtualisées appliquée aux grappes. Cette solution a permis de développer un ordonnanceur exécutant simultanément un maximum d'applications virtualisées. Nous montrons qu'une telle solution augmente le taux d'occupation de notre grappe et réduit le temps de traitement des applications

Topology-aware GPU scheduling for learning workloads in cloud environments

Author: Amaral Marcelo
Carrera David
Polo Bardés Jordà
Seelam Seetharami
Steinder Malgorzata
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2017
Field of study

Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Changement de contexte pour tâches virtualisées à l'échelle des grappes

Author: Hermenier Fabien
Lebre Adrien
Menaud Jean-Marc
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

INRIA a CCSD electronic archive server

HAL Mines Nantes

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

Author: Ates Emre
Bash Cullen
Corbalan Julita
Ejarque Jorge
Emeras Joseph
Galleguillos Cristian
Griebler Dalvan
Knapp Rashawn L
Lin X.
Michael R
Ozer Gence
Sirbu Alina
Zhang Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/04/2020
Field of study

As we approach the exascale era, the size and complexity of HPC systems continues to increase, raising concerns about their manageability and sustainability. For this reason, more and more HPC centers are experimenting with fine-grained monitoring coupled with Operational Data Analytics (ODA) to optimize efficiency and effectiveness of system operations. However, while monitoring is a common reality in HPC, there is no well-stated and comprehensive list of requirements, nor matching frameworks, to support holistic and online ODA. This leads to insular ad-hoc solutions, each addressing only specific aspects of the problem. In this paper we propose Wintermute, a novel generic framework to enable online ODA on large-scale HPC installations. Its design is based on the results of a literature survey of common operational requirements. We implement Wintermute on top of the holistic DCDB monitoring system, offering a large variety of configuration options to accommodate the varying requirements of ODA applications. Moreover, Wintermute is based on a set of logical abstractions to ease the configuration of models at a large scale and maximize code re-use. We highlight Wintermute's flexibility through a series of practical case studies, each targeting a different aspect of the management of HPC systems, and then demonstrate the small resource footprint of our implementation.Comment: Accepted for publication at the 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020

arXiv.org e-Print Archive

Crossref

VSCM: a Virtual Server Consolidation Manager for Cluster

Author: Hanli Bai
Yang Liu
Yunchu Liu
Publication venue
Publication date: 06/03/2020
Field of study

Abstract. Virtual server consolidation is to use virtual machines to encapsulate applications which are running on multiple physical servers in the cluster and then integrate them into a small number of servers. Nowadays, with the expanding of enterprise-class data centers, virtual server consolidation can reduce large number of servers to help the enterprises reduce hardware and operating costs significantly and improve server utilization greatly. In this paper, we propose the VSCM manager for virtual cluster, which solves the problems in the consolidation from a globally optimal view and also takes migration overhead into account. Experiment results in virtual cluster demonstrate that, VSCM can greatly reduce the number of servers and the migration overhead

CiteSeerX

Virtual machine consolidated placement based on multi-objective biogeography-based optimization

Author: Al-Fares
Alicherry
Barham
Beloglazov
Biran
Breitgand
Békési
Cardosa
Chen
Du
Elnozahy
Fan
Fang
Feller
Feng Tian
Ferdaus
Ferreto
Gao
Giurgiu
Grit
Gulati
Hermenier
Jayasinghe
Jeyarani
Jeyarani
Jia Li
Jianke Zhang
Jin
Karve
Kuo-Ming Chao
Lee
Li
Li
Liao
Lien
Lin
Lin
Liu
Ma
Ma
Mell
Meng
Mi
Nazaraf Shah
Qinghua Zheng
Rui Li
Shi
Shrivastava
Simon
Simon
Speitkamp
Tang
Tian
Van
Verma
Verma
Voorsluys
Wang
Weil
Xiuqi Li
Xu
Yue
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crossref

Coventry University Pure Portal

PIASA: A power and interference aware resource management strategy for heterogeneous workloads in cloud data centers

Author: Barbosa Jorge G.
Prodan Radu
Sampaio Altino M.
Publication venue: 'Elsevier BV'
Publication date: 01/09/2015
Field of study

Cloud data centers have been progressively adopted in different scenarios, as reflected in the execution of heterogeneous applications with diverse workloads and diverse quality of service (QoS) requirements. Virtual machine (VM) technology eases resource management in physical servers and helps cloud providers achieve goals such as optimization of energy consumption. However, the performance of an application running inside a VM is not guaranteed due to the interference among co-hosted workloads sharing the same physical resources. Moreover, the different types of co-hosted applications with diverse QoS requirements as well as the dynamic behavior of the cloud makes efficient provisioning of resources even more difficult and a challenging problem in cloud data centers. In this paper, we address the problem of resource allocation within a data center that runs different types of application workloads, particularly CPU- and network-intensive applications. To address these challenges, we propose an interference- and power-aware management mechanism that combines a performance deviation estimator and a scheduling algorithm to guide the resource allocation in virtualized environments. We conduct simulations by injecting synthetic workloads whose characteristics follow the last version of the Google Cloud tracelogs. The results indicate that our performance-enforcing strategy is able to fulfill contracted SLAs of real-world environments while reducing energy costs by as much as 21%

Repositório Científico do Instituto Politécnico do Porto

Crossref

A Survey of Research on Power Management Techniques for High Performance Systems

Author: Liu Y
Zhu H
Publication venue
Publication date: 01/01/2010
Field of study

This paper surveys the research on power management techniques for high performance systems. These include both commercial high performance clusters and scientific high performance computing (HPC) systems. Power consumption has rapidly risen to an intolerable scale. This results in both high operating costs and high failure rates so it is now a major cause for concern. It is imposed new challenges to the development of high performance systems. In this paper, we first review the basic mechanisms that underlie power management techniques. Then we survey two fundamental techniques for power management: metrics and profiling. After that, we review the research for the two major types of high performance systems: commercial clusters and supercomputers. Based on this, we discuss the new opportunities and problems presented by the recent adoption of virtualization techniques, and again we present the most recent research on this. Finally, we summarise and discuss future research directions

Oxford Brookes University: RADAR