205 research outputs found
Distribution matching for transduction
Many transductive inference algorithms assume that distributions over training and test estimates should be related, e.g. by providing a large margin of separation on both sets. We use this idea to design a transduction algorithm which can be used without modification for classification, regression, and structured estimation. At its heart we exploit the fact that for a good learner the distributions over the outputs on training and test sets should match. This is a classical two-sample problem which can be solved efficiently in its most general form by using distance measures in Hilbert Space. It turns out that a number of existing heuristics can be viewed as special cases of our approach.
ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly
Matrix completion and approximation are popular tools to capture a user's
preferences for recommendation and to approximate missing data. Instead of
using low-rank factorization we take a drastically different approach, based on
the simple insight that an additive model of co-clusterings allows one to
approximate matrices efficiently. This allows us to build a concise model that,
per bit of model learned, significantly beats all factorization approaches to
matrix approximation. Even more surprisingly, we find that summing over small
co-clusterings is more effective in modeling matrices than classic
co-clustering, which uses just one large partitioning of the matrix.
Following Occam's razor principle suggests that the simple structure induced
by our model better captures the latent preferences and decision making
processes present in the real world than classic co-clustering or matrix
factorization. We provide an iterative minimization algorithm, a collapsed
Gibbs sampler, theoretical guarantees for matrix approximation, and excellent
empirical evidence for the efficacy of our approach. We achieve
state-of-the-art results on the Netflix problem with a fraction of the model
complexity.Comment: 22 pages, under review for conference publicatio
The Falling Factorial Basis and Its Statistical Applications
We study a novel spline-like basis, which we name the "falling factorial
basis", bearing many similarities to the classic truncated power basis. The
advantage of the falling factorial basis is that it enables rapid, linear-time
computations in basis matrix multiplication and basis matrix inversion. The
falling factorial functions are not actually splines, but are close enough to
splines that they provably retain some of the favorable properties of the
latter functions. We examine their application in two problems: trend filtering
over arbitrary input points, and a higher-order variant of the two-sample
Kolmogorov-Smirnov test.Comment: Full version for the ICML paper with the same titl
Stochastic Frank-Wolfe Methods for Nonconvex Optimization
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum
optimization problems. Frank-Wolfe methods (in the convex case) have gained
tremendous recent interest in machine learning and optimization communities due
to their projection-free property and their ability to exploit structured
constraints. However, our understanding of these algorithms in the nonconvex
setting is fairly limited. In this paper, we propose nonconvex stochastic
Frank-Wolfe methods and analyze their convergence properties. For objective
functions that decompose into a finite-sum, we leverage ideas from variance
reduction techniques for convex optimization to obtain new variance reduced
nonconvex Frank-Wolfe methods that have provably faster convergence than the
classical Frank-Wolfe method. Finally, we show that the faster convergence
rates of our variance reduced methods also translate into improved convergence
rates for the stochastic setting
- …