Search CORE

79 research outputs found

Adjoint computation and Backpropagation

Author: Pallez Guillaume
Publication venue: HAL CCSD
Publication date: 08/04/2019
Field of study

International audienceIn this talk Dr Pallez will discuss the impact of memory in the computation of automatic differentiation or for the backpropagation step of machine learning algorithms. He will show different strategies based on the amount of memory available. In particular he will discuss optimal strategies when one can reuse memory slots, and when considering a hierarchical memory platfor

INRIA a CCSD electronic archive server

Le non-sens écologique des voitures autonomes

Author: Pallez Guillaume
Publication venue: HAL CCSD
Publication date: 22/07/2019
Field of study

Article publié sur le blog Binaire : https://www.lemonde.fr/blog/binaire/2019/07/22/le-non-sens-ecologique-des-voitures-autonomes/Dans cet article de vulgarisation, je discute si l'avénement promis des véhicules autonomes serait ou non réelement un moyen de réduire la pollution (notamment dans les villes) (plutôt pas)

INRIA a CCSD electronic archive server

H-Revolve: A Framework for Adjoint Computation on Synchronous Hierarchical Platforms

Author: Herrmann Julien
Pallez Guillaume
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

International audienceWe study the problem of checkpointing strategies for adjoint computation on synchronous hierarchicalplatforms, specifically computational platforms with several levels of storage with different writing andreading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where is a criticaldecision for the overall performance of the computation. We introduce H-Revolve, an optimal algorithm forthis problem. We make it available in a public Python library along with the implementation of several state-of-the-art algorithms for the variant of the problem with two levels of storage. We provide a detailed descriptionof how one can use this library in an adjoint computation software in the field of automatic differentiationor backpropagation. Finally, we evaluate the performance of H-Revolve and other checkpointing heuristicsthough an extensive campaign of simulation

INRIA a CCSD electronic archive server

Making Speculative Scheduling Robust to Incomplete Data

Author: Gainaru Ana
Pallez Guillaume
Publication venue: HAL CCSD
Publication date: 18/11/2019
Field of study

International audienceIn this work, we study the robustness of SpeculativeScheduling to data incompleteness. Speculative scheduling hasallowed to incorporate future types of applications into thedesign of HPC schedulers, specifically applications whose runtimeis not perfectly known but can be modeled with probabilitydistributions. Preliminary studies show the importance of spec-ulative scheduling in dealing with stochastic applications whenthe application runtime model is completely known. In this workwe show how one can extract enough information even fromincomplete behavioral data for a given HPC applications sothat speculative scheduling still performs well. Specifically, weshow that for synthetic runtimes who follow usual probabilitydistributions such as truncated normal or exponential, we canextract enough data from as little as 10 previous runs, to bewithin 5% of the solution which has exact information. For realtraces of applications, the performance with 10 data points varieswith the applications (within 20% of the full-knowledge solution),but converges fast (5% with 100 previous samples).Finally a side effect of this study is to show the importanceof the theoretical results obtained on continuous probabilitydistributions for speculative scheduling. Indeed, we observe thatthe solutions for such distributions are more robust to incompletedata than the solutions for discrete distributions

INRIA a CCSD electronic archive server

Ordonnancement d'entrées/sorties périodiques avec des chaînes bicolores: modèles et algorithmes

Author: Jeannot Emmanuel
Pallez Guillaume
Vidal Nicolas
Publication venue: Springer Verlag
Publication date: 01/01/2021
Field of study

International audienceObservations show that some HPC applications periodically alternate between (i) operations (computations, local data-accesses) executed on the compute nodes, and (ii) I/O transfers of data and this behavior can be predicted beforehand. While the compute nodes are allocated separately to each application, the storage is shared and thus I/O access can be a bottleneck leading to contention. To tackle this issue, we design new static I/O scheduling algorithms that prescribe when each application can access the storage. To design a static algorithm, we emphasize on the periodic behavior of most applications. Scheduling the I/O volume of the different applications is repeated over time. This is critical since often the number of application runs is very high. In the following report, we develop a formal background for I/O scheduling. First, we define a model, bi-colored chain scheduling, then we go through related results existing in the literature and explore the complexity of this problem variants. Finally, to match the HPC context, we perform experiments based on use-cases matching highly parallel applications or distributed learning framewor

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

The role of storage target allocation in applications' I/O performance with BeeGFS

Author: Boito Francieli
Pallez Guillaume
Teylo Luan
Publication venue: HAL CCSD
Publication date: 06/09/2022
Field of study

International audienceParallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity to retrieve these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets. Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments and also provide valuable information for future work on storage target allocation and stripe count tuning

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Profiles of upcoming HPC Applications and their Impact on Reservation Strategies

Author: Gainaru Ana
Goglin Brice
Honoré Valentin
Pallez Guillaume
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2021
Field of study

International audienceWith the expected convergence between HPC, BigData and AI, new applications with different profiles are coming to HPC infrastructures. We aim at better understanding the features and needs of these applications in order to be able to run them efficiently on HPC platforms. The approach followed is bottom-up: we study thoroughly an emerging application, Spatially Localized Atlas Network Tiles (SLANT, originating from the neuroscience community) to understand its behavior. Based on these observations, we derive a generic, yet simple, application model (namely, a linear sequence of stochastic jobs). We expect this model to be representative for a large set of upcoming applications from emerging fields that start to require the computational power of HPC clusters without fitting the typical behavior of large-scale traditional applications. In a second step, we show how one can use this generic model in a scheduling framework. Specifically we consider the problem of making reservations (both time and memory) for an execution on an HPC platform based on the application expected resource requirements. We derive solutions using the model provided by the first step of this work. We experimentally show the robustness of the model, even with very few data points or using another application, to generate the model, and provide performance gains with regards to standard and more recent approaches used in the neuroscience community

INRIA a CCSD electronic archive server

checkpoint_schedules: schedules for incremental checkpointing of adjoint simulations

Author: Ham David A.
Herrmann Julien
I. Dolci Daiane
Maddison James R
Pallez Guillaume
Publication venue
Publication date: 22/03/2024
Field of study

checkpoint_schedules provides schedules for step-based incremental checkpointing of the adjoints to computer models. The schedules contain instructions indicating the sequence of forward and adjoint steps to be executed, and the data storage and retrieval to be performed.These instructions are independent of the model implementation, which enables the model authors to switch between checkpointing algorithms without recoding. Conversely, checkpointing_schedules provides developers of checkpointing algorithms a direct mechanism to convey their work to model authors. checkpointing_schedules has been integrated into tlm_adjoint (James R. Maddison et al., 2019), a Python library designed for the automated derivationof higher-order tangent-linear and adjoint models and work is ongoing to integrate it with pyadjoint (Mitusch et al., 2019). This package can be incorporated into other gradient solvers based on adjoint methods, regardless of the specific approach taken to generate the adjoint model

Edinburgh Research Explorer

Optimal Memory-aware Backpropagation of Deep Join Networks

Author: Beaumont Olivier
Herrmann Julien
Pallez Guillaume
Shilova Alena
Publication venue: Royal Society, The
Publication date: 01/01/2019
Field of study

International audienceDeep Learning training memory needs can preventthe user to consider large models and large batchsizes. In this work, we propose to use techniquesfrom memory-aware scheduling and AutomaticDifferentiation (AD) to execute a backpropagationgraph with a bounded memory requirement at thecost of extra recomputations. The case of a singlehomogeneous chain, i.e. the case of a networkwhose all stages are identical and form a chain,is well understood and optimal solutions havebeen proposed in the AD literature. The networksencountered in practice in the context of DeepLearning are much more diverse, both in terms ofshape and heterogeneity.In this work, we define the class of backpropagationgraphs, and extend those on which one can computein polynomial time a solution that minimizes the totalnumber of recomputations. In particular we considerjoin graphs which correspond to models such asSiamese or Cross Modal Networks

INRIA a CCSD electronic archive server

Stratégies d’ordonnancement pour un système en temps-réel surchargé

Author: Gao Yiqin
Pallez Guillaume
Robert Yves
Vivien Frédéric
Publication venue: HAL CCSD
Publication date: 01/02/2022
Field of study

This paper introduces and assesses novel strategies to schedule firm real-time jobs on an overloaded server. The jobs are released periodically and have the same relative deadline. Job execution times obey an arbitrary probability distribution and can take unbounded values (no WCET). We introduce three control parameters to decide when to start or interrupt a job. We couple this dynamic scheduling with several admission policies and investigate several optimization criteria, the most prominent being the Deadline Miss Ratio (DMR). Then we derive a Markov model and use its stationary distribution to determine the best value of each control parameter. Finally we conduct an extensive simulation campaign with 14 different probability distributions; the results nicely demonstrate how the new control parameters help improve system performance compared with traditional approaches. In particular, we show that (i) the best admission policy is to admit all jobs; (ii) the key control parameter is to upper bound the start time of each job; (iii) the best scheduling strategy decreases the DMR by up to 0.35 over traditional competitors.Ce travail présente et évalue de nouvelles stratégies d’ordonnancement pour exécuter des tâches périodiques en temps réel sur une plate-forme surchargée. Les tâches arrivent périodiquement et ont le même délai relatif pour leur exécution. Les temps d’exécution des tâches obéissent à une distribution de probabilité arbitraire et peuvent prendre des valeurs illimitées (pas de WCET). Certaines tâches peuvent être interrompues à leur admission dans le système ou bien en cours d’exécution. Nous introduisons trois paramètres de contrôle pour décider quand démarrer ou interrompre une tâche. Nous associons cet ordonnancement dynamique à plusieurs politiques d’admission et étudions plusieurs critères d’optimisation, le plus important étant le Deadline Miss Ratio (DMR). Ensuite, nous dérivons un modèle deMarkov et utilisons sa distribution stationnaire pour déterminer la meilleure valeur de chaque paramètre de contrôle. Enfin, nous conduisons de vastes simulations avec 14 distributions de probabilité différentes ; les résultats démontrentbien comment les nouveaux paramètres de contrôle contribuent à améliorer les performances du système par rapport aux approches traditionnelles. En particulier, nous montrons que (i) la meilleure politique d’admission est d’admettre toutes les tâches; (ii) le paramètre de contrôle clé est de limiter le temps de début de chaque tâche après son admission; (iii) la meilleure stratégie de planification diminue le DMR jusqu’à 0,35 par rapport aux concurrents traditionnels

INRIA a CCSD electronic archive server