10,488 research outputs found
Fast Cross-Validation via Sequential Testing
With the increasing size of today's data sets, finding the right parameter
configuration in model selection via cross-validation can be an extremely
time-consuming task. In this paper we propose an improved cross-validation
procedure which uses nonparametric testing coupled with sequential analysis to
determine the best parameter set on linearly increasing subsets of the data. By
eliminating underperforming candidates quickly and keeping promising candidates
as long as possible, the method speeds up the computation while preserving the
capability of the full cross-validation. Theoretical considerations underline
the statistical power of our procedure. The experimental evaluation shows that
our method reduces the computation time by a factor of up to 120 compared to a
full cross-validation with a negligible impact on the accuracy
A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning
Learning sparse combinations is a frequent theme in machine learning. In this
paper, we study its associated optimization problem in the distributed setting
where the elements to be combined are not centrally located but spread over a
network. We address the key challenges of balancing communication costs and
optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW)
algorithm. We obtain theoretical guarantees on the optimization error
and communication cost that do not depend on the total number of
combining elements. We further show that the communication cost of dFW is
optimal by deriving a lower-bound on the communication cost required to
construct an -approximate solution. We validate our theoretical
analysis with empirical studies on synthetic and real-world data, which
demonstrate that dFW outperforms both baselines and competing methods. We also
study the performance of dFW when the conditions of our analysis are relaxed,
and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape
Scalable Kernel Methods via Doubly Stochastic Gradients
The general perception is that kernel methods are not scalable, and neural
nets are the methods of choice for nonlinear learning problems. Or have we
simply not tried hard enough for kernel methods? Here we propose an approach
that scales up kernel methods using a novel concept called "doubly stochastic
functional gradients". Our approach relies on the fact that many kernel methods
can be expressed as convex optimization problems, and we solve the problems by
making two unbiased stochastic approximations to the functional gradient, one
using random training points and another using random functions associated with
the kernel, and then descending using this noisy functional gradient. We show
that a function produced by this procedure after iterations converges to
the optimal function in the reproducing kernel Hilbert space in rate ,
and achieves a generalization performance of . This doubly
stochasticity also allows us to avoid keeping the support vectors and to
implement the algorithm in a small memory footprint, which is linear in number
of iterations and independent of data dimension. Our approach can readily scale
kernel methods up to the regimes which are dominated by neural nets. We show
that our method can achieve competitive performance to neural nets in datasets
such as 8 million handwritten digits from MNIST, 2.3 million energy materials
from MolecularSpace, and 1 million photos from ImageNet.Comment: 32 pages, 22 figure
- …