19,233 research outputs found
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function has
to be minimized, given only the knowledge of unbiased estimates of its
gradients at certain points, a framework which includes machine learning
methods based on the minimization of the empirical risk. We focus on problems
without strong convexity, for which all previously known algorithms achieve a
convergence rate for function values of O(1/n^{1/2}). We consider and analyze
two algorithms that achieve a rate of O(1/n) for classical supervised learning
problems. For least-squares regression, we show that averaged stochastic
gradient descent with constant step-size achieves the desired rate. For
logistic regression, this is achieved by a simple novel stochastic gradient
algorithm that (a) constructs successive local quadratic approximations of the
loss functions, while (b) preserving the same running time complexity as
stochastic gradient descent. For these algorithms, we provide a non-asymptotic
analysis of the generalization error (in expectation, and also in high
probability for least-squares), and run extensive experiments on standard
machine learning benchmarks showing that they often outperform existing
approaches
Online Active Linear Regression via Thresholding
We consider the problem of online active learning to collect data for
regression modeling. Specifically, we consider a decision maker with a limited
experimentation budget who must efficiently learn an underlying linear
population model. Our main contribution is a novel threshold-based algorithm
for selection of most informative observations; we characterize its performance
and fundamental lower bounds. We extend the algorithm and its guarantees to
sparse linear regression in high-dimensional settings. Simulations suggest the
algorithm is remarkably robust: it provides significant benefits over passive
random sampling in real-world datasets that exhibit high nonlinearity and high
dimensionality --- significantly reducing both the mean and variance of the
squared error.Comment: Published in AAAI 201
Orthogonal Statistical Learning
We provide non-asymptotic excess risk guarantees for statistical learning in
a setting where the population risk with respect to which we evaluate the
target parameter depends on an unknown nuisance parameter that must be
estimated from data. We analyze a two-stage sample splitting meta-algorithm
that takes as input two arbitrary estimation algorithms: one for the target
parameter and one for the nuisance parameter. We show that if the population
risk satisfies a condition called Neyman orthogonality, the impact of the
nuisance estimation error on the excess risk bound achieved by the
meta-algorithm is of second order. Our theorem is agnostic to the particular
algorithms used for the target and nuisance and only makes an assumption on
their individual performance. This enables the use of a plethora of existing
results from statistical learning and machine learning to give new guarantees
for learning with a nuisance component. Moreover, by focusing on excess risk
rather than parameter estimation, we can give guarantees under weaker
assumptions than in previous works and accommodate settings in which the target
parameter belongs to a complex nonparametric class. We provide conditions on
the metric entropy of the nuisance and target classes such that oracle
rates---rates of the same order as if we knew the nuisance parameter---are
achieved. We also derive new rates for specific estimation algorithms such as
variance-penalized empirical risk minimization, neural network estimation and
sparse high-dimensional linear model estimation. We highlight the applicability
of our results in four settings of central importance: 1) heterogeneous
treatment effect estimation, 2) offline policy optimization, 3) domain
adaptation, and 4) learning with missing data
- …