19 research outputs found
Simulation-Based Parallel Training
Numerical simulations are ubiquitous in science and engineering. Machine
learning for science investigates how artificial neural architectures can learn
from these simulations to speed up scientific discovery and engineering
processes. Most of these architectures are trained in a supervised manner. They
require tremendous amounts of data from simulations that are slow to generate
and memory greedy. In this article, we present our ongoing work to design a
training framework that alleviates those bottlenecks. It generates data in
parallel with the training process. Such simultaneity induces a bias in the
data available during the training. We present a strategy to mitigate this bias
with a memory buffer. We test our framework on the multi-parametric Lorenz's
attractor. We show the benefit of our framework compared to offline training
and the success of our data bias mitigation strategy to capture the complex
chaotic dynamics of the system
The Challenges of In Situ Analysis for Multiple Simulations
International audienceIn situ analysis and visualization have mainly been applied to the output of a single large-scale simulation. However, topics involving the execution of multiple simulations in supercomputers have only received minimal attention so far. Some important examples are uncertainty quantification, data assimilation, and complex optimization. In this position article, beyond highlighting the strengths and limitations of the tools that we have developed over the past few years, we share lessons learned from using them on large-scale platforms and from interacting with end users. We then discuss the forthcoming challenges, which future in situ analysis and vi-sualization frameworks will face when dealing with the exascale execution of multiple simulations
Training Deep Surrogate Models with Large Scale Online Learning
The spatiotemporal resolution of Partial Differential Equations (PDEs) plays
important roles in the mathematical description of the world's physical
phenomena. In general, scientists and engineers solve PDEs numerically by the
use of computationally demanding solvers. Recently, deep learning algorithms
have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on
disk and read back for training. This paper advocates that relying on a
traditional static dataset to train these models does not allow the full
benefit of the solver to be used as a data generator. It proposes an open
source online training framework for deep surrogate models. The framework
implements several levels of parallelism focused on simultaneously generating
numerical simulations and training deep neural networks. This approach
suppresses the I/O and storage bottleneck associated with disk-loaded datasets,
and opens the way to training on significantly larger datasets. Experiments
compare the offline and online training of four surrogate models, including
state-of-the-art architectures. Results indicate that exposing deep surrogate
models to more dataset diversity, up to hundreds of GB, can increase model
generalization capabilities. Fully connected neural networks, Fourier Neural
Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved
by 68%, 16% and 7%, respectively
Unlocking Large Scale Uncertainty Quantification with In Transit Iterative Statistics
International audienceMulti-run numerical simulations using supercomputers are increasingly used by physicists and engineers for dealing with input data and model uncertainties. Most of the time, the input parameters of a simulation are modeled as random variables, then simulations are run a (possibly large) number of times with input parameters varied according to a specific design of experiments. Uncertainty quantification for numerical simulations is a hard computational problem, currently bounded by the large size of the produced results. This book chapter is about using in situ techniques to enable large scale uncertainty quantification studies. We provide a comprehensive description of Melissa, a file avoiding, adaptive, fault-tolerant, and elastic framework that computes in transit statistical quantities of interest. Melissa currently implements the on-the-fly computation of the statistics necessary for the realization of large scale uncertainty quantification studies: moment-based statistics (mean, standard deviation, higher orders), quantiles, Sobol' indices, and threshold exceedance
Anålisis de campañas de comunicación social
Sisenes Jornades de Foment de la InvestigaciĂł de la FCHS (Any 2000-2001
Simulation-Based Parallel Training
International audienceNumerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simulations that are slow to generate and memory greedy. In this article, we present our ongoing work to design a training framework that alleviates those bottlenecks. It generates data in parallel with the training process. Such simultaneity induces a bias in the data available during the training. We present a strategy to mitigate this bias with a memory buffer. We test our framework on the multi-parametric Lorenz's attractor. We show the benefit of our framework compared to offline training and the success of our data bias mitigation strategy to capture the complex chaotic dynamics of the system
Simulation-Based Parallel Training
International audienceNumerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simulations that are slow to generate and memory greedy. In this article, we present our ongoing work to design a training framework that alleviates those bottlenecks. It generates data in parallel with the training process. Such simultaneity induces a bias in the data available during the training. We present a strategy to mitigate this bias with a memory buffer. We test our framework on the multi-parametric Lorenz's attractor. We show the benefit of our framework compared to offline training and the success of our data bias mitigation strategy to capture the complex chaotic dynamics of the system
A hybrid Reduced Basis and Machine-Learning algorithm for building Surrogate Models: a first application to electromagnetism
International audienceA surrogate model approximates the outputs of a Partial Differential Equations (PDEs) solver with a low computational cost. In this article, we propose a method to build learning-based surrogates in the context of parameterized PDEs, which are PDEs that depend on a set of parameters but are also temporal and spatial processes. Our contribution is a method hybridizing the Proper Orthogonal Decomposition and several Support Vector Regression machines. We present promising results on a first electromagnetic use case (a primitive single-phase transformer)
Calibration and Spectral Reconstruction for CRISATEL: an Art Painting Multispectral Acquisition System
The CRISATEL multispectral acquisition system is dedicated to the digital archiving of fine art paintings. It is composed of a dynamic lighting system and of a high-resolution camera equipped with a CCD linear array, 13 interference filters and several built-in electronically controlled mechanisms. A custom calibration procedure has been designed and implemented. It allows us to select the parameters to be used for the raw image acquisition and to collect experimental data, which will be used in the post processing stage to correct the obtained multispectral images. Various techniques have been tested and compared in order to reconstruct the spectral reflectance curve of the painting surface imaged in each pixel. Realistic colour rendering under any illuminants can then be obtained from this spectral reconstruction. The results obtained with the CRISATEL acquisition system and the associated multispectral image processing are shown on two art painting examples
A hybrid Reduced Basis and Machine-Learning algorithm for building Surrogate Models: a first application to electromagnetism
International audienceA surrogate model approximates the outputs of a Partial Differential Equations (PDEs) solver with a low computational cost. In this article, we propose a method to build learning-based surrogates in the context of parameterized PDEs, which are PDEs that depend on a set of parameters but are also temporal and spatial processes. Our contribution is a method hybridizing the Proper Orthogonal Decomposition and several Support Vector Regression machines. We present promising results on a first electromagnetic use case (a primitive single-phase transformer)