79 research outputs found

    Adjoint computation and Backpropagation

    Get PDF
    International audienceIn this talk Dr Pallez will discuss the impact of memory in the computation of automatic differentiation or for the backpropagation step of machine learning algorithms. He will show different strategies based on the amount of memory available. In particular he will discuss optimal strategies when one can reuse memory slots, and when considering a hierarchical memory platfor

    Le non-sens Ă©cologique des voitures autonomes

    Get PDF
    Article publié sur le blog Binaire : https://www.lemonde.fr/blog/binaire/2019/07/22/le-non-sens-ecologique-des-voitures-autonomes/Dans cet article de vulgarisation, je discute si l'avénement promis des véhicules autonomes serait ou non réelement un moyen de réduire la pollution (notamment dans les villes) (plutÎt pas)

    H-Revolve: A Framework for Adjoint Computation on Synchronous Hierarchical Platforms

    Get PDF
    International audienceWe study the problem of checkpointing strategies for adjoint computation on synchronous hierarchicalplatforms, specifically computational platforms with several levels of storage with different writing andreading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where is a criticaldecision for the overall performance of the computation. We introduce H-Revolve, an optimal algorithm forthis problem. We make it available in a public Python library along with the implementation of several state-of-the-art algorithms for the variant of the problem with two levels of storage. We provide a detailed descriptionof how one can use this library in an adjoint computation software in the field of automatic differentiationor backpropagation. Finally, we evaluate the performance of H-Revolve and other checkpointing heuristicsthough an extensive campaign of simulation

    Making Speculative Scheduling Robust to Incomplete Data

    Get PDF
    International audienceIn this work, we study the robustness of SpeculativeScheduling to data incompleteness. Speculative scheduling hasallowed to incorporate future types of applications into thedesign of HPC schedulers, specifically applications whose runtimeis not perfectly known but can be modeled with probabilitydistributions. Preliminary studies show the importance of spec-ulative scheduling in dealing with stochastic applications whenthe application runtime model is completely known. In this workwe show how one can extract enough information even fromincomplete behavioral data for a given HPC applications sothat speculative scheduling still performs well. Specifically, weshow that for synthetic runtimes who follow usual probabilitydistributions such as truncated normal or exponential, we canextract enough data from as little as 10 previous runs, to bewithin 5% of the solution which has exact information. For realtraces of applications, the performance with 10 data points varieswith the applications (within 20% of the full-knowledge solution),but converges fast (5% with 100 previous samples).Finally a side effect of this study is to show the importanceof the theoretical results obtained on continuous probabilitydistributions for speculative scheduling. Indeed, we observe thatthe solutions for such distributions are more robust to incompletedata than the solutions for discrete distributions

    Ordonnancement d'entrées/sorties périodiques avec des chaßnes bicolores: modÚles et algorithmes

    Get PDF
    International audienceObservations show that some HPC applications periodically alternate between (i) operations (computations, local data-accesses) executed on the compute nodes, and (ii) I/O transfers of data and this behavior can be predicted beforehand. While the compute nodes are allocated separately to each application, the storage is shared and thus I/O access can be a bottleneck leading to contention. To tackle this issue, we design new static I/O scheduling algorithms that prescribe when each application can access the storage. To design a static algorithm, we emphasize on the periodic behavior of most applications. Scheduling the I/O volume of the different applications is repeated over time. This is critical since often the number of application runs is very high. In the following report, we develop a formal background for I/O scheduling. First, we define a model, bi-colored chain scheduling, then we go through related results existing in the literature and explore the complexity of this problem variants. Finally, to match the HPC context, we perform experiments based on use-cases matching highly parallel applications or distributed learning framewor

    The role of storage target allocation in applications' I/O performance with BeeGFS

    Get PDF
    International audienceParallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity to retrieve these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets. Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments and also provide valuable information for future work on storage target allocation and stripe count tuning

    Profiles of upcoming HPC Applications and their Impact on Reservation Strategies

    Get PDF
    International audienceWith the expected convergence between HPC, BigData and AI, new applications with different profiles are coming to HPC infrastructures. We aim at better understanding the features and needs of these applications in order to be able to run them efficiently on HPC platforms. The approach followed is bottom-up: we study thoroughly an emerging application, Spatially Localized Atlas Network Tiles (SLANT, originating from the neuroscience community) to understand its behavior. Based on these observations, we derive a generic, yet simple, application model (namely, a linear sequence of stochastic jobs). We expect this model to be representative for a large set of upcoming applications from emerging fields that start to require the computational power of HPC clusters without fitting the typical behavior of large-scale traditional applications. In a second step, we show how one can use this generic model in a scheduling framework. Specifically we consider the problem of making reservations (both time and memory) for an execution on an HPC platform based on the application expected resource requirements. We derive solutions using the model provided by the first step of this work. We experimentally show the robustness of the model, even with very few data points or using another application, to generate the model, and provide performance gains with regards to standard and more recent approaches used in the neuroscience community

    checkpoint_schedules: schedules for incremental checkpointing of adjoint simulations

    Get PDF
    checkpoint_schedules provides schedules for step-based incremental checkpointing of the adjoints to computer models. The schedules contain instructions indicating the sequence of forward and adjoint steps to be executed, and the data storage and retrieval to be performed.These instructions are independent of the model implementation, which enables the model authors to switch between checkpointing algorithms without recoding. Conversely, checkpointing_schedules provides developers of checkpointing algorithms a direct mechanism to convey their work to model authors. checkpointing_schedules has been integrated into tlm_adjoint (James R. Maddison et al., 2019), a Python library designed for the automated derivationof higher-order tangent-linear and adjoint models and work is ongoing to integrate it with pyadjoint (Mitusch et al., 2019). This package can be incorporated into other gradient solvers based on adjoint methods, regardless of the specific approach taken to generate the adjoint model

    Optimal Memory-aware Backpropagation of Deep Join Networks

    Get PDF
    International audienceDeep Learning training memory needs can preventthe user to consider large models and large batchsizes. In this work, we propose to use techniquesfrom memory-aware scheduling and AutomaticDifferentiation (AD) to execute a backpropagationgraph with a bounded memory requirement at thecost of extra recomputations. The case of a singlehomogeneous chain, i.e. the case of a networkwhose all stages are identical and form a chain,is well understood and optimal solutions havebeen proposed in the AD literature. The networksencountered in practice in the context of DeepLearning are much more diverse, both in terms ofshape and heterogeneity.In this work, we define the class of backpropagationgraphs, and extend those on which one can computein polynomial time a solution that minimizes the totalnumber of recomputations. In particular we considerjoin graphs which correspond to models such asSiamese or Cross Modal Networks

    StratĂ©gies d’ordonnancement pour un systĂšme en temps-rĂ©el surchargĂ©

    Get PDF
    This paper introduces and assesses novel strategies to schedule firm real-time jobs on an overloaded server. The jobs are released periodically and have the same relative deadline. Job execution times obey an arbitrary probability distribution and can take unbounded values (no WCET). We introduce three control parameters to decide when to start or interrupt a job. We couple this dynamic scheduling with several admission policies and investigate several optimization criteria, the most prominent being the Deadline Miss Ratio (DMR). Then we derive a Markov model and use its stationary distribution to determine the best value of each control parameter. Finally we conduct an extensive simulation campaign with 14 different probability distributions; the results nicely demonstrate how the new control parameters help improve system performance compared with traditional approaches. In particular, we show that (i) the best admission policy is to admit all jobs; (ii) the key control parameter is to upper bound the start time of each job; (iii) the best scheduling strategy decreases the DMR by up to 0.35 over traditional competitors.Ce travail prĂ©sente et Ă©value de nouvelles stratĂ©gies d’ordonnancement pour exĂ©cuter des tĂąches pĂ©riodiques en temps rĂ©el sur une plate-forme surchargĂ©e. Les tĂąches arrivent pĂ©riodiquement et ont le mĂȘme dĂ©lai relatif pour leur exĂ©cution. Les temps d’exĂ©cution des tĂąches obĂ©issent Ă  une distribution de probabilitĂ© arbitraire et peuvent prendre des valeurs illimitĂ©es (pas de WCET). Certaines tĂąches peuvent ĂȘtre interrompues Ă  leur admission dans le systĂšme ou bien en cours d’exĂ©cution. Nous introduisons trois paramĂštres de contrĂŽle pour dĂ©cider quand dĂ©marrer ou interrompre une tĂąche. Nous associons cet ordonnancement dynamique Ă  plusieurs politiques d’admission et Ă©tudions plusieurs critĂšres d’optimisation, le plus important Ă©tant le Deadline Miss Ratio (DMR). Ensuite, nous dĂ©rivons un modĂšle deMarkov et utilisons sa distribution stationnaire pour dĂ©terminer la meilleure valeur de chaque paramĂštre de contrĂŽle. Enfin, nous conduisons de vastes simulations avec 14 distributions de probabilitĂ© diffĂ©rentes ; les rĂ©sultats dĂ©montrentbien comment les nouveaux paramĂštres de contrĂŽle contribuent Ă  amĂ©liorer les performances du systĂšme par rapport aux approches traditionnelles. En particulier, nous montrons que (i) la meilleure politique d’admission est d’admettre toutes les tĂąches; (ii) le paramĂštre de contrĂŽle clĂ© est de limiter le temps de dĂ©but de chaque tĂąche aprĂšs son admission; (iii) la meilleure stratĂ©gie de planification diminue le DMR jusqu’à 0,35 par rapport aux concurrents traditionnels
    • 

    corecore