3 research outputs found
Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics
In a wide range of statistical learning problems such as ranking, clustering
or metric learning among others, the risk is accurately estimated by
-statistics of degree , i.e. functionals of the training data with
low variance that take the form of averages over -tuples. From a
computational perspective, the calculation of such statistics is highly
expensive even for a moderate sample size , as it requires averaging
terms. This makes learning procedures relying on the optimization of
such data functionals hardly feasible in practice. It is the major goal of this
paper to show that, strikingly, such empirical risks can be replaced by
drastically computationally simpler Monte-Carlo estimates based on terms
only, usually referred to as incomplete -statistics, without damaging the
learning rate of Empirical Risk Minimization (ERM)
procedures. For this purpose, we establish uniform deviation results describing
the error made when approximating a -process by its incomplete version under
appropriate complexity assumptions. Extensions to model selection, fast rate
situations and various sampling techniques are also considered, as well as an
application to stochastic gradient descent for ERM. Finally, numerical examples
are displayed in order to provide strong empirical evidence that the approach
we promote largely surpasses more naive subsampling techniques.Comment: To appear in Journal of Machine Learning Research. 34 pages. v2:
minor correction to Theorem 4 and its proof, added 1 reference. v3: typo
corrected in Proposition 3. v4: improved presentation, added experiments on
model selection for clustering, fixed minor typo
Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning
The development of cluster computing frameworks has allowed practitioners to
scale out various statistical estimation and machine learning algorithms with
minimal programming effort. This is especially true for machine learning
problems whose objective function is nicely separable across individual data
points, such as classification and regression. In contrast, statistical
learning tasks involving pairs (or more generally tuples) of data points - such
as metric learning, clustering or ranking do not lend themselves as easily to
data-parallelism and in-memory computing. In this paper, we investigate how to
balance between statistical performance and computational efficiency in such
distributed tuplewise statistical problems. We first propose a simple strategy
based on occasionally repartitioning data across workers between parallel
computation stages, where the number of repartitioning steps rules the
trade-off between accuracy and runtime. We then present some theoretical
results highlighting the benefits brought by the proposed method in terms of
variance reduction, and extend our results to design distributed stochastic
gradient descent algorithms for tuplewise empirical risk minimization. Our
results are supported by numerical experiments in pairwise statistical
estimation and learning on synthetic and real-world datasets.Comment: 23 pages, 6 figures, ECML 201
Building confidence regions for the ROC surface
International audienc