Search CORE

11 research outputs found

Mistake Bounds for Binary Matrix Completion

Author: Herbster MJ
Pasteris S
Pontil M
Publication venue: NIPS 2016
Publication date: 01/12/2016
Field of study

We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance

UCL Discovery

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Author: Arora Sanjeev
Khodak Mikhail
Saunshi Nikunj
Zhang Yi
Publication venue
Publication date: 01/01/2020
Field of study

One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is

\Omega(d)

, where

d

is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of

\mathcal{O}(1)

, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.Comment: 34 page

arXiv.org e-Print Archive

Princeton University Open Access Repository