35 research outputs found
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
We study the problem of learning-to-learn: inferring a learning algorithm
that works well on tasks sampled from an unknown distribution. As class of
algorithms we consider Stochastic Gradient Descent on the true risk regularized
by the square euclidean distance to a bias vector. We present an average excess
risk bound for such a learning algorithm. This result quantifies the potential
benefit of using a bias vector with respect to the unbiased case. We then
address the problem of estimating the bias from a sequence of tasks. We propose
a meta-algorithm which incrementally updates the bias, as new tasks are
observed. The low space and time complexity of this approach makes it appealing
in practice. We provide guarantees on the learning ability of the
meta-algorithm. A key feature of our results is that, when the number of tasks
grows and their variance is relatively small, our learning-to-learn approach
has a significant advantage over learning each task in isolation by Stochastic
Gradient Descent without a bias term. We report on numerical experiments which
demonstrate the effectiveness of our approach.Comment: 37 pages, 8 figure
Demystifying Assumptions in Learning to Discover Novel Classes
In learning to discover novel classes (L2DNC), we are given labeled data from
seen classes and unlabeled data from unseen classes, and we train clustering
models for the unseen classes. However, the rigorous definition of L2DNC is
unexplored, which results in that its implicit assumptions are still unclear.
In this paper, we demystify assumptions behind L2DNC and find that high-level
semantic features should be shared among the seen and unseen classes. This
naturally motivates us to link L2DNC to meta-learning that has exactly the same
assumption as L2DNC. Based on this finding, L2DNC is not only theoretically
solvable, but can also be empirically solved by meta-learning algorithms after
slight modifications. This L2DNC methodology significantly reduces the amount
of unlabeled data needed for training and makes it more practical, as
demonstrated in experiments. The use of very limited data is also justified by
the application scenario of L2DNC: since it is unnatural to label only
seen-class data, L2DNC is sampling instead of labeling in causality. Therefore,
unseen-class data should be collected on the way of collecting seen-class data,
which is why they are novel and first need to be clustered