619 research outputs found
Sharp Oracle Inequalities for Aggregation of Affine Estimators
We consider the problem of combining a (possibly uncountably infinite) set of
affine estimators in non-parametric regression model with heteroscedastic
Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a
PAC-Bayesian type inequality that leads to sharp oracle inequalities in
discrete but also in continuous settings. The framework is general enough to
cover the combinations of various procedures such as least square regression,
kernel ridge regression, shrinking estimators and many other estimators used in
the literature on statistical inverse problems. As a consequence, we show that
the proposed aggregate provides an adaptive estimator in the exact minimax
sense without neither discretizing the range of tuning parameters nor splitting
the set of observations. We also illustrate numerically the good performance
achieved by the exponentially weighted aggregate
Optimal Two-Step Prediction in Regression
High-dimensional prediction typically comprises two steps: variable selection
and subsequent least-squares refitting on the selected variables. However, the
standard variable selection procedures, such as the lasso, hinge on tuning
parameters that need to be calibrated. Cross-validation, the most popular
calibration scheme, is computationally costly and lacks finite sample
guarantees. In this paper, we introduce an alternative scheme, easy to
implement and both computationally and theoretically efficient
On Lasso refitting strategies
A well-know drawback of l_1-penalized estimators is the systematic shrinkage
of the large coefficients towards zero. A simple remedy is to treat Lasso as a
model-selection procedure and to perform a second refitting step on the
selected support. In this work we formalize the notion of refitting and provide
oracle bounds for arbitrary refitting procedures of the Lasso solution. One of
the most widely used refitting techniques which is based on Least-Squares may
bring a problem of interpretability, since the signs of the refitted estimator
might be flipped with respect to the original estimator. This problem arises
from the fact that the Least-Squares refitting considers only the support of
the Lasso solution, avoiding any information about signs or amplitudes. To this
end we define a sign consistent refitting as an arbitrary refitting procedure,
preserving the signs of the first step Lasso solution and provide Oracle
inequalities for such estimators. Finally, we consider special refitting
strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful
property to converge to the Sign-Least-Squares refitting (Least-Squares with
sign constraints), which provides with greater interpretability. We
additionally study the Bregman Lasso refitting in the case of orthogonal
design, providing with simple intuition behind the proposed method. Boosted
Lasso, in contrast, considers information about magnitudes of the first Lasso
step and allows to develop better oracle rates for prediction. Finally, we
conduct an extensive numerical study to show advantages of one approach over
others in different synthetic and semi-real scenarios.Comment: revised versio
Extending Gossip Algorithms to Distributed Estimation of U-Statistics
Efficient and robust algorithms for decentralized estimation in networks are
essential to many distributed systems. Whereas distributed estimation of sample
mean statistics has been the subject of a good deal of attention, computation
of -statistics, relying on more expensive averaging over pairs of
observations, is a less investigated area. Yet, such data functionals are
essential to describe global properties of a statistical population, with
important examples including Area Under the Curve, empirical variance, Gini
mean difference and within-cluster point scatter. This paper proposes new
synchronous and asynchronous randomized gossip algorithms which simultaneously
propagate data across the network and maintain local estimates of the
-statistic of interest. We establish convergence rate bounds of and
for the synchronous and asynchronous cases respectively, where
is the number of iterations, with explicit data and network dependent
terms. Beyond favorable comparisons in terms of rate analysis, numerical
experiments provide empirical evidence the proposed algorithms surpasses the
previously introduced approach.Comment: to be presented at NIPS 201
Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions
In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach
- âŠ