3 research outputs found
Loss-Proportional Subsampling for Subsequent ERM
We propose a sampling scheme suitable for reducing a data set prior to
selecting a hypothesis with minimum empirical risk. The sampling only considers
a subset of the ultimate (unknown) hypothesis set, but can nonetheless
guarantee that the final excess risk will compare favorably with utilizing the
entire original data set. We demonstrate the practical benefits of our approach
on a large dataset which we subsample and subsequently fit with boosted trees.Comment: Appears in the proceedings of the 30th International Conference on
Machine Learnin
Online Learning to Sample
Stochastic Gradient Descent (SGD) is one of the most widely used techniques
for online optimization in machine learning. In this work, we accelerate SGD by
adaptively learning how to sample the most useful training examples at each
time step. First, we show that SGD can be used to learn the best possible
sampling distribution of an importance sampling estimator. Second, we show that
the sampling distribution of an SGD algorithm can be estimated online by
incrementally minimizing the variance of the gradient. The resulting algorithm
- called Adaptive Weighted SGD (AW-SGD) - maintains a set of parameters to
optimize, as well as a set of parameters to sample learning examples. We show
that AWSGD yields faster convergence in three different applications: (i) image
classification with deep features, where the sampling of images depends on
their labels, (ii) matrix factorization, where rows and columns are not sampled
uniformly, and (iii) reinforcement learning, where the optimized and
exploration policies are estimated at the same time, where our approach
corresponds to an off-policy gradient algorithm.Comment: Update: removed convergence theorem and proof as there is an error.
Submitted to UAI 201
Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression
A major challenge for building statistical models in the big data era is that
the available data volume far exceeds the computational capability. A common
approach for solving this problem is to employ a subsampled dataset that can be
handled by available computational resources. In this paper, we propose a
general subsampling scheme for large-scale multi-class logistic regression and
examine the variance of the resulting estimator. We show that asymptotically,
the proposed method always achieves a smaller variance than that of the uniform
random sampling. Moreover, when the classes are conditionally imbalanced,
significant improvement over uniform sampling can be achieved. Empirical
performance of the proposed method is compared to other methods on both
simulated and real-world datasets, and these results match and confirm our
theoretical analysis