51 research outputs found
Functional approach for excess mass estimation in the density model
We consider a multivariate density model where we estimate the excess mass of
the unknown probability density at a given level from i.i.d.
observed random variables. This problem has several applications such as
multimodality testing, density contour clustering, anomaly detection,
classification and so on. For the first time in the literature we estimate the
excess mass as an integrated functional of the unknown density . We suggest
an estimator and evaluate its rate of convergence, when belongs to general
Besov smoothness classes, for several risk measures. A particular care is
devoted to implementation and numerical study of the studied procedure. It
appears that our procedure improves the plug-in estimator of the excess mass.Comment: Published in at http://dx.doi.org/10.1214/07-EJS079 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis
In the liquefied natural gas (LNG) shipping industry, the phenomenon of
sloshing can lead to the occurrence of very high pressures in the tanks of the
vessel. The issue of modelling or estimating the probability of the
simultaneous occurrence of such extremal pressures is now crucial from the risk
assessment point of view. In this paper, heavy-tail modelling, widely used as a
conservative approach to risk assessment and corresponding to a worst-case risk
analysis, is applied to the study of sloshing. Multivariate heavy-tailed
distributions are considered, with Sloshing pressures investigated by means of
small-scale replica tanks instrumented with d >1 sensors. When attempting to
fit such nonparametric statistical models, one naturally faces computational
issues inherent in the phenomenon of dimensionality. The primary purpose of
this article is to overcome this barrier by introducing a novel methodology.
For d-dimensional heavy-tailed distributions, the structure of extremal
dependence is entirely characterised by the angular measure, a positive measure
on the intersection of a sphere with the positive orthant in Rd. As d
increases, the mutual extremal dependence between variables becomes difficult
to assess. Based on a spectral clustering approach, we show here how a low
dimensional approximation to the angular measure may be found. The
nonparametric method proposed for model sloshing has been successfully applied
to pressure data. The parsimonious representation thus obtained proves to be
very convenient for the simulation of multivariate heavy-tailed distributions,
allowing for the implementation of Monte-Carlo simulation schemes in estimating
the probability of failure. Besides confirming its performance on artificial
data, the methodology has been implemented on a real data set specifically
collected for risk assessment of sloshing in the LNG shipping industry
Statistical learning for wind power : a modeling and stability study towards forecasting
We focus on wind power modeling using machine learning techniques. We show on
real data provided by the wind energy company Ma{\"i}a Eolis, that parametric
models, even following closely the physical equation relating wind production
to wind speed are outperformed by intelligent learning algorithms. In
particular, the CART-Bagging algorithm gives very stable and promising results.
Besides, as a step towards forecast, we quantify the impact of using
deteriorated wind measures on the performances. We show also on this
application that the default methodology to select a subset of predictors
provided in the standard random forest package can be refined, especially when
there exists among the predictors one variable which has a major impact
A clusterwise supervised learning procedure based on aggregation of distances
Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, and when different underlying predictive models exist, fitting a model is a more challenging task. We propose, in this paper, a procedure in three steps to automatically solve this problem. The KFC procedure aggregates different models adaptively on data. The first step of the procedure aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. It provides several partitions, given the assumptions on the distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems
Grouping Strategies and Thresholding for High Dimensional Linear Models
The estimation problem in a high regression model with structured sparsity is
investigated. An algorithm using a two steps block thresholding procedure
called GR-LOL is provided. Convergence rates are produced: they depend on
simple coherence-type indices of the Gram matrix -easily checkable on the data-
as well as sparsity assumptions of the model parameters measured by a
combination of within-blocks with between-blocks norms. The
simplicity of the coherence indicator suggests ways to optimize the rates of
convergence when the group structure is not naturally given by the problem and
is unknown. In such a case, an auto-driven procedure is provided to determine
the regressors groups (number and contents). An intensive practical study
compares our grouping methods with the standard LOL algorithm. We prove that
the grouping rarely deteriorates the results but can improve them very
significantly. GR-LOL is also compared with group-Lasso procedures and exhibits
a very encouraging behavior. The results are quite impressive, especially when
GR-LOL algorithm is combined with a grouping pre-processing
To tree or not to tree? Assessing the impact of smoothing the decision boundaries
When analyzing a dataset, it can be useful to assess how smooth the decision
boundaries need to be for a model to better fit the data. This paper addresses
this question by proposing the quantification of how much should the 'rigid'
decision boundaries, produced by an algorithm that naturally finds such
solutions, be relaxed to obtain a performance improvement. The approach we
propose starts with the rigid decision boundaries of a seed Decision Tree (seed
DT), which is used to initialize a Neural DT (NDT). The initial boundaries are
challenged by relaxing them progressively through training the NDT. During this
process, we measure the NDT's performance and decision agreement to its seed
DT. We show how these two measures can help the user in figuring out how
expressive his model should be, before exploring it further via model
selection. The validity of our approach is demonstrated with experiments on
simulated and benchmark datasets.Comment: 12 pages, 3 figures, 3 tables. arXiv admin note: text overlap with
arXiv:2006.1145
A Meta-Generation framework for Industrial System Generation
Generative design is an increasingly important tool in the industrial world.
It allows the designers and engineers to easily explore vast ranges of design
options, providing a cheaper and faster alternative to the trial and failure
approaches. Thanks to the flexibility they offer, Deep Generative Models are
gaining popularity amongst Generative Design technologies. However, developing
and evaluating these models can be challenging. The field lacks accessible
benchmarks, in order to evaluate and compare objectively different Deep
Generative Models architectures. Moreover, vanilla Deep Generative Models
appear to be unable to accurately generate multi-components industrial systems
that are controlled by latent design constraints. To address these challenges,
we propose an industry-inspired use case that incorporates actual industrial
system characteristics. This use case can be quickly generated and used as a
benchmark. We propose a Meta-VAE capable of producing multi-component
industrial systems and showcase its application on the proposed use case
- …