1,788 research outputs found
Optimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that
sequentially produces a model in the form of linear combinations of simple
predictors---typically decision trees---by solving an infinite-dimensional
convex optimization problem. We provide in the present paper a thorough
analysis of two widespread versions of gradient boosting, and introduce a
general framework for studying these algorithms from the point of view of
functional optimization. We prove their convergence as the number of iterations
tends to infinity and highlight the importance of having a strongly convex risk
functional to minimize. We also present a reasonable statistical context
ensuring consistency properties of the boosting predictors as the sample size
grows. In our approach, the optimization procedures are run forever (that is,
without resorting to an early stopping strategy), and statistical
regularization is basically achieved via an appropriate penalization of
the loss and strong convexity arguments
An optimal path to transition in a duct
This paper is concerned with the transition of the laminar flow in a duct of
square cross-section. Like in the similar case of the pipe flow, the motion is
linearly stable for all Reynolds numbers, rendering this flow a suitable
candidate for a study of the 'bypass' path to turbulence. It has already been
shown \citep{Biau_JFM_2008} that the classical linear optimal perturbation
problem, yielding optimal disturbances in the form of longitudinal vortices,
fails to provide an 'optimal' path to turbulence, i.e. optimal perturbations do
not elicit a significant nonlinear response from the flow. Previous simulations
have also indicated that a pair of travelling waves generates immediately, by
nonlinear quadratic interactions, an unstable mean flow distortion, responsible
for rapid breakdown. By the use of functions quantifying the sensitivity of the
motion to deviations in the base flow, the 'optimal' travelling wave associated
to its specific defect is found by a variational approach. This optimal
solution is then integrated in time and shown to display a qualitative
similarity to the so-called 'minimal defect', for the same parameters. Finally,
numerical simulations of a 'edge state' are conducted, to identify an unstable
solution which mediates laminar-turbulent transition and relate it to results
of the optimisation procedure
Analysis of a Random Forests Model
Random forests are a scheme proposed by Leo Breiman in the 2000's for
building a predictor ensemble with a set of decision trees that grow in
randomly selected subspaces of data. Despite growing interest and practical
use, there has been little exploration of the statistical properties of random
forests, and little is known about the mathematical forces driving the
algorithm. In this paper, we offer an in-depth analysis of a random forests
model suggested by Breiman in \cite{Bre04}, which is very close to the original
algorithm. We show in particular that the procedure is consistent and adapts to
sparsity, in the sense that its rate of convergence depends only on the number
of strong features and not on how many noise variables are present
The Statistical Performance of Collaborative Inference
The statistical analysis of massive and complex data sets will require the
development of algorithms that depend on distributed computing and
collaborative inference. Inspired by this, we propose a collaborative framework
that aims to estimate the unknown mean of a random variable . In
the model we present, a certain number of calculation units, distributed across
a communication network represented by a graph, participate in the estimation
of by sequentially receiving independent data from while
exchanging messages via a stochastic matrix defined over the graph. We give
precise conditions on the matrix under which the statistical precision of
the individual units is comparable to that of a (gold standard) virtual
centralized estimate, even though each unit does not have access to all of the
data. We show in particular the fundamental role played by both the non-trivial
eigenvalues of and the Ramanujan class of expander graphs, which provide
remarkable performance for moderate algorithmic cost
- …