112 research outputs found
Parametric estimation of complex mixed models based on meta-model approach
Complex biological processes are usually experimented along time among a
collection of individuals. Longitudinal data are then available and the
statistical challenge is to better understand the underlying biological
mechanisms. The standard statistical approach is mixed-effects model, with
regression functions that are now highly-developed to describe precisely the
biological processes (solutions of multi-dimensional ordinary differential
equations or of partial differential equation). When there is no analytical
solution, a classical estimation approach relies on the coupling of a
stochastic version of the EM algorithm (SAEM) with a MCMC algorithm. This
procedure needs many evaluations of the regression function which is clearly
prohibitive when a time-consuming solver is used for computing it. In this work
a meta-model relying on a Gaussian process emulator is proposed to replace this
regression function. The new source of uncertainty due to this approximation
can be incorporated in the model which leads to what is called a mixed
meta-model. A control on the distance between the maximum likelihood estimates
in this mixed meta-model and the maximum likelihood estimates obtained with the
exact mixed model is guaranteed. Eventually, numerical simulations are
performed to illustrate the efficiency of this approach
Variational Inference for Stochastic Block Models from Sampled Data
This paper deals with non-observed dyads during the sampling of a network and
consecutive issues in the inference of the Stochastic Block Model (SBM). We
review sampling designs and recover Missing At Random (MAR) and Not Missing At
Random (NMAR) conditions for the SBM. We introduce variants of the variational
EM algorithm for inferring the SBM under various sampling designs (MAR and
NMAR) all available as an R package. Model selection criteria based on
Integrated Classification Likelihood are derived for selecting both the number
of blocks and the sampling design. We investigate the accuracy and the range of
applicability of these algorithms with simulations. We explore two real-world
networks from ethnology (seed circulation network) and biology (protein-protein
interaction network), where the interpretations considerably depends on the
sampling designs considered
Maximin design on non hypercube domain and kernel interpolation
In the paradigm of computer experiments, the choice of an experimental design
is an important issue. When no information is available about the black-box
function to be approximated, an exploratory design have to be used. In this
context, two dispersion criteria are usually considered: the minimax and the
maximin ones. In the case of a hypercube domain, a standard strategy consists
of taking the maximin design within the class of Latin hypercube designs.
However, in a non hypercube context, it does not make sense to use the Latin
hypercube strategy. Moreover, whatever the design is, the black-box function is
typically approximated thanks to kernel interpolation. Here, we first provide a
theoretical justification to the maximin criterion with respect to kernel
interpolations. Then, we propose simulated annealing algorithms to determine
maximin designs in any bounded connected domain. We prove the convergence of
the different schemes.Comment: 3 figure
Bounding rare event probabilities in computer experiments
We are interested in bounding probabilities of rare events in the context of
computer experiments. These rare events depend on the output of a physical
model with random input variables. Since the model is only known through an
expensive black box function, standard efficient Monte Carlo methods designed
for rare events cannot be used. We then propose a strategy to deal with this
difficulty based on importance sampling methods. This proposal relies on
Kriging metamodeling and is able to achieve sharp upper confidence bounds on
the rare event probabilities. The variability due to the Kriging metamodeling
step is properly taken into account. The proposed methodology is applied to a
toy example and compared to more standard Bayesian bounds. Finally, a
challenging real case study is analyzed. It consists of finding an upper bound
of the probability that the trajectory of an airborne load will collide with
the aircraft that has released it.Comment: 21 pages, 6 figure
Adaptive numerical designs for the calibration of computer codes
Making good predictions of a physical system using a computer code requires
the inputs to be carefully specified. Some of these inputs called control
variables have to reproduce physical conditions whereas other inputs, called
parameters, are specific to the computer code and most often uncertain. The
goal of statistical calibration consists in estimating these parameters with
the help of a statistical model which links the code outputs with the field
measurements. In a Bayesian setting, the posterior distribution of these
parameters is normally sampled using MCMC methods. However, they are
impractical when the code runs are high time-consuming. A way to circumvent
this issue consists of replacing the computer code with a Gaussian process
emulator, then sampling a cheap-to-evaluate posterior distribution based on it.
Doing so, calibration is subject to an error which strongly depends on the
numerical design of experiments used to fit the emulator. We aim at reducing
this error by building a proper sequential design by means of the Expected
Improvement criterion. Numerical illustrations in several dimensions assess the
efficiency of such sequential strategies
Parametric estimation of complex mixed models based on meta-model approach
International audienceComplex biological processes are usually experimented along time among a collection of individuals. Longitudinal data are then available and the statistical challenge is to better understand the underlying biological mechanisms. The standard statistical approach is mixed-effects model, with regression functions that are now highly-developed to describe precisely the biological processes (solutions of multi-dimensional ordinary differential equations or of partial differential equation). When there is no analytical solution, a classical estimation approach relies on the coupling of a stochastic version of the EM algorithm (SAEM) with a MCMC algorithm. This procedure needs many evaluations of the regression function which is clearly prohibitive when a time-consuming solver is used for computing it. In this work a meta-model relying on a Gaussian process emulator is proposed to replace this regression function. The new source of uncertainty due to this approximation can be incorporated in the model which leads to what is called a mixed meta-model. A control on the distance between the maximum likelihood estimates in this mixed meta-model and the maximum likelihood estimates obtained with the exact mixed model is guaranteed. Eventually, numerical simulations are performed to illustrate the efficiency of this approach
- …