11 research outputs found
Mistake Bounds for Binary Matrix Completion
We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance
A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
One popular trend in meta-learning is to learn from many training tasks a
common initialization for a gradient-based method that can be used to solve a
new task with few samples. The theory of meta-learning is still in its early
stages, with several recent learning-theoretic analyses of methods such as
Reptile [Nichol et al., 2018] being for convex models. This work shows that
convex-case analysis might be insufficient to understand the success of
meta-learning, and that even for non-convex models it is important to look
inside the optimization black-box, specifically at properties of the
optimization trajectory. We construct a simple meta-learning instance that
captures the problem of one-dimensional subspace learning. For the convex
formulation of linear regression on this instance, we show that the new task
sample complexity of any initialization-based meta-learning algorithm is
, where is the input dimension. In contrast, for the non-convex
formulation of a two layer linear network on the same instance, we show that
both Reptile and multi-task representation learning can have new task sample
complexity of , demonstrating a separation from convex
meta-learning. Crucially, analyses of the training dynamics of these methods
reveal that they can meta-learn the correct subspace onto which the data should
be projected.Comment: 34 page