15,261 research outputs found
Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
We show that matrix completion with trace-norm regularization can be
significantly hurt when entries of the matrix are sampled non-uniformly. We
introduce a weighted version of the trace-norm regularizer that works well also
with non-uniform sampling. Our experimental results demonstrate that the
weighted trace-norm regularization indeed yields significant gains on the
(highly non-uniformly sampled) Netflix dataset.Comment: 9 page
On Symmetric and Asymmetric LSHs for Inner Product Search
We consider the problem of designing locality sensitive hashes (LSH) for
inner product similarity, and of the power of asymmetric hashes in this
context. Shrivastava and Li argue that there is no symmetric LSH for the
problem and propose an asymmetric LSH based on different mappings for query and
database points. However, we show there does exist a simple symmetric LSH that
enjoys stronger guarantees and better empirical performance than the asymmetric
LSH they suggest. We also show a variant of the settings where asymmetry is
in-fact needed, but there a different asymmetric LSH is required.Comment: 11 pages, 3 figures, In Proceedings of The 32nd International
Conference on Machine Learning (ICML
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems,
with homogeneous linear predictors on linearly separable datasets. We show the
predictor converges to the direction of the max-margin (hard margin SVM)
solution. The result also generalizes to other monotone decreasing loss
functions with an infimum at infinity, to multi-class problems, and to training
a weight layer in a deep network in a certain restricted setting. Furthermore,
we show this convergence is very slow, and only logarithmic in the convergence
of the loss itself. This can help explain the benefit of continuing to optimize
the logistic or cross-entropy loss even after the training error is zero and
the training loss is extremely small, and, as we show, even if the validation
loss increases. Our methodology can also aid in understanding implicit
regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main
improvements in journal version over conference version (v2 appeared in
ICLR): We proved the measure zero case for main theorem (with implications
for the rates), and the multi-class cas
- …