22 research outputs found
Understanding the Energy Consumption of HPC Scale Artificial Intelligence
International audienceThis paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI), and more specifically Deep Learning (DL) algorithms. For this task we developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments. We exploited hardware counters and Python libraries to collect energy information through software, which enabled us to instrument a known AI benchmark tool, and to evaluate the energy consumption of numerous DL algorithms and models. Through an experimental campaign, we show a case example of the potential of benchmark-tracker to measure the computing speed and the energy consumption for training and inference DL algorithms, and also the potential of Benchmark-Tracker to help better understanding the energy behavior of DL algorithms in HPC platforms. This work is a step forward to better understand the energy consumption of Deep Learning in HPC, and it also contributes with a new tool to help HPC DL developers to better balance the HPC infrastructure in terms of speed and energy consumption
Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning
International audienceDynamic scheduling of tasks in large-scale HPC platforms is normally accomplished using ad-hoc heuristics, based on task characteristics, combined with some backfilling strategy. Defining heuristics that work efficiently in different scenarios is a difficult task, specially when considering the large variety of task types and platform architectures. In this work, we present a methodology based on simulation and machine learning to obtain dynamic scheduling policies. Using simulations and a workload generation model, we can determine the characteristics of tasks that lead to a reduction in the mean slowdown of tasks in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next task to execute in a queue dramatically improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in important performance improvements, attesting the generalization capability of the obtained heuristics
Short-Term Ambient Temperature Forecasting for Smart Heaters
Maintaining Cloud data centers is a worrying challenge in terms of energy efficiency. This challenge leads to solutions such as deploying Edge nodes that operate inside buildings without massive cooling systems. Edge nodes can act assmart heaters by recycling their consumed energy to heat these buildings. We propose a novel technique to perform temperature forecasting for Edge Computing smart heater environments. Our approach uses time series algorithms to exploit historical air temperature data with smart heatersâ power consumption and heat-sink temperatures to create models to predict short-term ambient temperatures. We implemented our approach on top of Facebookâs Prophet time series forecasting framework, and we used the real-time logs from Qarnot Computing as a usecase of a smart heater Edge platform. Our best trained model yields ambient temperature forecasts with less than 2.66% Mean Absolute Percentage Error showing the feasibility of near realtime forecasting
Learning about simple heuristics for online parallel job scheduling
Les plate-formes de Calcul Haute Performance (High Performance Computing, HPC) augmentent en taille et en complexitĂ©. De maniĂšre contradictoire, la demande en Ă©nergie de telles plates-formes a Ă©galement rapidement augmentĂ©. Les supercalculateurs actuels ont besoin dâune puissance Ă©quivalente Ă celle de toute une centrale dâĂ©nergie. Dans le but de faire un usage plus responsable de ce puissance de calcul, les chercheurs consacrent beaucoup dâefforts Ă la conception dâalgorithmes et de techniques permettant dâamĂ©liorer diffĂ©rents aspects de performance, tels que lâordonnancement et la gestion des ressources. Cependent, les responsables des plate-formes HPC hĂ©sitent encore Ă dĂ©ployer des mĂ©thodes dâordonnancement Ă la fine pointe de la technologie et la plupart dâentre eux recourent Ă des mĂ©thodes heuristiques simples, telles que lâEASY Backfilling, qui repose sur un tri naĂŻf premier arrivĂ©, premier servi. Les nouvelles mĂ©thodes sont souvent complexes et obscures, et la simplicitĂ© et la transparence de lâEASY Backfilling sont trop importantes pour ĂȘtre sacrifiĂ©es.Dans un premier temps, nous explorons les techniques dâApprentissage Automatique (Machine Learning, ML) pour apprendre des mĂ©thodes heuristiques dâordonnancement online de tĂąches parallĂšles. Ă lâaide de simulations et dâun modĂšle de gĂ©nĂ©ration de charge de travail, nous avons pu dĂ©terminer les caractĂ©ristiques des applications HPC (tĂąches) qui contribuent pour une rĂ©duction du ralentissement moyen des tĂąches dans une file dâattente dâexĂ©cution. La modĂ©lisation de ces caractĂ©ristiques par une fonction non linĂ©aire et lâapplication de cette fonction pour sĂ©lectionner la prochaine tĂąche Ă exĂ©cuter dans une file dâattente ont amĂ©liorĂ© le ralentissement moyen des tĂąches dans les charges de travail synthĂ©tiques. AppliquĂ©es Ă des traces de charges de travail rĂ©elles de plate-formes HPC trĂšs diffĂ©rents, ces fonctions ont nĂ©anmoins permis dâamĂ©liorer les performances, attestant de la capacitĂ© de gĂ©nĂ©ralisation des heuristiques obtenues.Dans un deuxiĂšme temps, Ă lâaide de simulations et de traces de charge de travail de plusieurs plates-formes HPC rĂ©elles, nous avons effectuĂ© une analyse approfondie des rĂ©sultats cumulĂ©s de quatre heuristiques simples dâordonnancement (y compris lâEASY Backfilling). Nous avons Ă©galement Ă©valuĂ© des outres effets tels que la relation entre la taille des tĂąches et leur ralentissement, la distribution des valeurs de ralentissement et le nombre de tĂąches mises en calcul par backfilling, par chaque plate-forme HPC et politique dâordonnancement. Nous dĂ©montrons de maniĂšre expĂ©rimentale que lâon ne peut que gagner en remplaçant lâEASY Backfilling par la stratĂ©gie SAF (Smallest estimated Area First) aidĂ©e par backfilling, car elle offre une amĂ©lioration des performances allant jusquâĂ 80% dans la mĂ©trique de ralentissement, tout en maintenant la simplicitĂ© et la transparence dâEASY Backfilling. La SAF rĂ©duit le nombre de tĂąches Ă hautes valeurs de ralentissement et, par lâinclusion dâun mĂ©canisme de seuillage simple, nous garantonts lâabsence dâinanition de tĂąches.Dans lâensemble, nous avons obtenu les remarques suivantes : (i) des heuristiques simples et efficaces sous la forme dâune fonction non linĂ©aire des caractĂ©ristiques des tĂąches peuvent ĂȘtre apprises automatiquement, bien quâil soit subjectif de conclure si le raisonnement qui sous-tend les dĂ©cisions dâordonnancement de ces heuristiques est clair ou non. (ii) La zone (lâestimation du temps dâexĂ©cution multipliĂ©e par le nombre de processeurs) des tĂąches semble ĂȘtre une propriĂ©tĂ© assez importante pour une bonne heuristique dâordonnancement des tĂąches parallĂšles, car un bon nombre dâheuristiques (notamment la SAF) qui ont obtenu de bonnes performances ont la zone de la tĂąche comme entrĂ©e (iii) Le mĂ©canisme de backfilling semble toujours contribuer Ă amĂ©liorer les performances, bien que cela ne remĂ©die pas Ă un meilleur tri de la file dâattente de tĂąches, tel que celui effectuĂ© par SAF.High-Performance Computing (HPC) platforms are growing in size and complexity. In an adversarial manner, the power demand of such platforms has rapidly grown as well, and current top supercomputers require power at the scale of an entire power plant. In an effort to make a more responsible usage of such power, researchers are devoting a great amount of effort to devise algorithms and techniques to improve different aspects of performance such as scheduling and resource management. But HPC platform maintainers are still reluctant to deploy state of the art scheduling methods and most of them revert to simple heuristics such as EASY Backfilling, which is based in a naive First-Come-First-Served (FCFS) ordering. Newer methods are often complex and obscure, and the simplicity and transparency of EASY Backfilling are too important to sacrifice.At a first moment we explored Machine Learning (ML) techniques to learn on-line parallel job scheduling heuristics. Using simulations and a workload generation model, we could determine the characteristics of HPC applications (jobs) that lead to a reduction in the mean slowdown of jobs in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next job to execute in a queue improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in performance improvements, attesting the generalization capability of the obtained heuristics.At a second moment, using simulations and workload traces from several real HPC platforms, we performed a thorough analysis of the cumulative results of four simple scheduling heuristics (including EASY Backfilling). We also evaluated effects such as the relationship between job size and slowdown, the distribution of slowdown values, and the number of backfilled jobs, for each HPC platform and scheduling policy. We show experimental evidence that one can only gain by replacing EASY Backfilling with the Smallest estimated Area First (SAF) policy with backfilling, as it offers improvements in performance by up to 80% in the slowdown metric while maintaining the simplicity and the transparency of EASY. SAF reduces the number of jobs with large slowdowns and the inclusion of a simple thresholding mechanism guarantees that no starvation occurs.Overall we achieved the following remarks: (i) simple and efficient scheduling heuristics in the form of a nonlinear function of the jobs characteristics can be learned automatically, though whether the reasoning behind their scheduling decisions is clear or not can be up to argument. (ii) The area (processing time estimate multiplied by the number of processors) of the jobs seems to be a quite important property for good parallel job scheduling heuristics, since many of the heuristics (notably SAF) that achieved good performances have the job's area as input. (iii) The backfilling mechanism seems to always help in increasing performance, though it does not outperform a better sorting of the jobs waiting queue, such as the sorting performed by SAF
Apprentissage sur heuristiques simples pour l'ordonnancement online de tĂąches parallĂšles
High-Performance Computing (HPC) platforms are growing in size and complexity. In an adversarial manner, the power demand of such platforms has rapidly grown as well, and current top supercomputers require power at the scale of an entire power plant. In an effort to make a more responsible usage of such power, researchers are devoting a great amount of effort to devise algorithms and techniques to improve different aspects of performance such as scheduling and resource management. But HPC platform maintainers are still reluctant to deploy state of the art scheduling methods and most of them revert to simple heuristics such as EASY Backfilling, which is based in a naive First-Come-First-Served (FCFS) ordering. Newer methods are often complex and obscure, and the simplicity and transparency of EASY Backfilling are too important to sacrifice.At a first moment we explored Machine Learning (ML) techniques to learn on-line parallel job scheduling heuristics. Using simulations and a workload generation model, we could determine the characteristics of HPC applications (jobs) that lead to a reduction in the mean slowdown of jobs in an execution queue. Modeling these characteristics using a nonlinear function and applying this function to select the next job to execute in a queue improved the mean task slowdown in synthetic workloads. When applied to real workload traces from highly different machines, these functions still resulted in performance improvements, attesting the generalization capability of the obtained heuristics.At a second moment, using simulations and workload traces from several real HPC platforms, we performed a thorough analysis of the cumulative results of four simple scheduling heuristics (including EASY Backfilling). We also evaluated effects such as the relationship between job size and slowdown, the distribution of slowdown values, and the number of backfilled jobs, for each HPC platform and scheduling policy. We show experimental evidence that one can only gain by replacing EASY Backfilling with the Smallest estimated Area First (SAF) policy with backfilling, as it offers improvements in performance by up to 80% in the slowdown metric while maintaining the simplicity and the transparency of EASY. SAF reduces the number of jobs with large slowdowns and the inclusion of a simple thresholding mechanism guarantees that no starvation occurs.Overall we achieved the following remarks: (i) simple and efficient scheduling heuristics in the form of a nonlinear function of the jobs characteristics can be learned automatically, though whether the reasoning behind their scheduling decisions is clear or not can be up to argument. (ii) The area (processing time estimate multiplied by the number of processors) of the jobs seems to be a quite important property for good parallel job scheduling heuristics, since many of the heuristics (notably SAF) that achieved good performances have the job's area as input. (iii) The backfilling mechanism seems to always help in increasing performance, though it does not outperform a better sorting of the jobs waiting queue, such as the sorting performed by SAF.Les plate-formes de Calcul Haute Performance (High Performance Computing, HPC) augmentent en taille et en complexitĂ©. De maniĂšre contradictoire, la demande en Ă©nergie de telles plates-formes a Ă©galement rapidement augmentĂ©. Les supercalculateurs actuels ont besoin dâune puissance Ă©quivalente Ă celle de toute une centrale dâĂ©nergie. Dans le but de faire un usage plus responsable de ce puissance de calcul, les chercheurs consacrent beaucoup dâefforts Ă la conception dâalgorithmes et de techniques permettant dâamĂ©liorer diffĂ©rents aspects de performance, tels que lâordonnancement et la gestion des ressources. Cependent, les responsables des plate-formes HPC hĂ©sitent encore Ă dĂ©ployer des mĂ©thodes dâordonnancement Ă la fine pointe de la technologie et la plupart dâentre eux recourent Ă des mĂ©thodes heuristiques simples, telles que lâEASY Backfilling, qui repose sur un tri naĂŻf premier arrivĂ©, premier servi. Les nouvelles mĂ©thodes sont souvent complexes et obscures, et la simplicitĂ© et la transparence de lâEASY Backfilling sont trop importantes pour ĂȘtre sacrifiĂ©es.Dans un premier temps, nous explorons les techniques dâApprentissage Automatique (Machine Learning, ML) pour apprendre des mĂ©thodes heuristiques dâordonnancement online de tĂąches parallĂšles. Ă lâaide de simulations et dâun modĂšle de gĂ©nĂ©ration de charge de travail, nous avons pu dĂ©terminer les caractĂ©ristiques des applications HPC (tĂąches) qui contribuent pour une rĂ©duction du ralentissement moyen des tĂąches dans une file dâattente dâexĂ©cution. La modĂ©lisation de ces caractĂ©ristiques par une fonction non linĂ©aire et lâapplication de cette fonction pour sĂ©lectionner la prochaine tĂąche Ă exĂ©cuter dans une file dâattente ont amĂ©liorĂ© le ralentissement moyen des tĂąches dans les charges de travail synthĂ©tiques. AppliquĂ©es Ă des traces de charges de travail rĂ©elles de plate-formes HPC trĂšs diffĂ©rents, ces fonctions ont nĂ©anmoins permis dâamĂ©liorer les performances, attestant de la capacitĂ© de gĂ©nĂ©ralisation des heuristiques obtenues.Dans un deuxiĂšme temps, Ă lâaide de simulations et de traces de charge de travail de plusieurs plates-formes HPC rĂ©elles, nous avons effectuĂ© une analyse approfondie des rĂ©sultats cumulĂ©s de quatre heuristiques simples dâordonnancement (y compris lâEASY Backfilling). Nous avons Ă©galement Ă©valuĂ© des outres effets tels que la relation entre la taille des tĂąches et leur ralentissement, la distribution des valeurs de ralentissement et le nombre de tĂąches mises en calcul par backfilling, par chaque plate-forme HPC et politique dâordonnancement. Nous dĂ©montrons de maniĂšre expĂ©rimentale que lâon ne peut que gagner en remplaçant lâEASY Backfilling par la stratĂ©gie SAF (Smallest estimated Area First) aidĂ©e par backfilling, car elle offre une amĂ©lioration des performances allant jusquâĂ 80% dans la mĂ©trique de ralentissement, tout en maintenant la simplicitĂ© et la transparence dâEASY Backfilling. La SAF rĂ©duit le nombre de tĂąches Ă hautes valeurs de ralentissement et, par lâinclusion dâun mĂ©canisme de seuillage simple, nous garantonts lâabsence dâinanition de tĂąches.Dans lâensemble, nous avons obtenu les remarques suivantes : (i) des heuristiques simples et efficaces sous la forme dâune fonction non linĂ©aire des caractĂ©ristiques des tĂąches peuvent ĂȘtre apprises automatiquement, bien quâil soit subjectif de conclure si le raisonnement qui sous-tend les dĂ©cisions dâordonnancement de ces heuristiques est clair ou non. (ii) La zone (lâestimation du temps dâexĂ©cution multipliĂ©e par le nombre de processeurs) des tĂąches semble ĂȘtre une propriĂ©tĂ© assez importante pour une bonne heuristique dâordonnancement des tĂąches parallĂšles, car un bon nombre dâheuristiques (notamment la SAF) qui ont obtenu de bonnes performances ont la zone de la tĂąche comme entrĂ©e (iii) Le mĂ©canisme de backfilling semble toujours contribuer Ă amĂ©liorer les performances, bien que cela ne remĂ©die pas Ă un meilleur tri de la file dâattente de tĂąches, tel que celui effectuĂ© par SAF
Run your HPC jobs in Eco-Mode: revealing the potential of user-assisted power capping in supercomputing systems
The energy consumption of an exascale High-Performance Computing (HPC) supercomputer rivals that of tens of thousands of people in terms of electricity demand. Given the substantial energy footprint of exascale HPC systems and the increasing strain on power grids due to climate-related events, electricity providers are starting to impose power caps during critical periods to their users. In this context, it becomes crucial to implement strategies that manage the power consumption of supercomputers while simultaneously ensuring their uninterrupted operation.This paper investigates the proposition that HPC users can willingly sacrifice some processing performance to contribute to a global energy-saving initiative. With the objective of offering an efficient energy-saving strategy by involving users, we introduce a user-assisted supercomputer power-capping methodology. In this approach, users have the option to voluntarily permit their applications to operate in a power-capped mode, denoted as âEco-Modeâ, as necessary. Leveraging HPC simulations, along with energy traces and application metadata derived from a recent Top500 HPC supercomputer, we conducted an experimental campaign to quantify the effects of Eco-Mode on energy conservation and on user experience. Specifically, our study aimed to demonstrate that, with a sufficient number of users choosing Eco-Mode, the supercomputer maintains good performances within the specified power cap. Furthermore, we sought to determine the optimal conditions regarding the number of users embracing Eco-Mode and the magnitude of power capping required for applications (i.e., the intensity of Eco-Mode). Our findings indicate that decreasing the speed of jobs can decrease significantly the number of jobs that must be killed. Moreover, as the adoption of Eco-Mode increases among users, the likelihood of every job to be killed also decreases
Understanding the Energy Consumption of HPC Scale Artificial Intelligence
International audienceThis paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI), and more specifically Deep Learning (DL) algorithms. For this task we developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments. We exploited hardware counters and Python libraries to collect energy information through software, which enabled us to instrument a known AI benchmark tool, and to evaluate the energy consumption of numerous DL algorithms and models. Through an experimental campaign, we show a case example of the potential of benchmark-tracker to measure the computing speed and the energy consumption for training and inference DL algorithms, and also the potential of Benchmark-Tracker to help better understanding the energy behavior of DL algorithms in HPC platforms. This work is a step forward to better understand the energy consumption of Deep Learning in HPC, and it also contributes with a new tool to help HPC DL developers to better balance the HPC infrastructure in terms of speed and energy consumption
An experimental analysis of regression-obtained HPC scheduling heuristics
Scheduling jobs in High-Performance Computing (HPC) platforms typically involves heuristics consisting of job sorting functions such as First-Come-First-Served or custom (hand-engineered).Linear regression methods are promising for exploiting scheduling data to create simple and transparent heuristics with lesser computational overhead than state-of-the-art learning methods. The drawback is lesser scheduling performance.We experimentally investigated the hypothesis that we could increase the scheduling performance of regression-obtained heuristics by increasing the complexity of the sorting functions and exploiting derivative job features.We used multiple linear regression to develop a factory of scheduling heuristics based on scheduling data. This factory uses general polynomials of the jobs' characteristics as templates for the scheduling heuristics. We defined a set of polynomials with increasing complexity between them, and we used our factory to create scheduling heuristics based on these polynomials. We evaluated the performance of the obtained heuristics with wide-range simulation experiments using real-world traces from 1997 to 2016. Our results show that large-sized polynomials led to unstable scheduling heuristics due to multicollinearity effects in the regression, with small-sized polynomials leading to a stable and efficient scheduling performance. These results conclude that (i) multicollinearity imposes a constraint when one wants to derive new features (i.e., feature engineering) for creating scheduling heuristics with regression, and (ii) regression-obtained scheduling heuristics can be resilient to the long-term evolution of HPC platforms and workloads