4 research outputs found
Global Convergence and Generalization Bound of Gradient-Based Meta-Learning with Deep Neural Nets
Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a
popular approach for few-shot learning. However, due to the non-convexity of
DNNs and the bi-level optimization in GBML, the theoretical properties of GBML
with DNNs remain largely unknown. In this paper, we first aim to answer the
following question: Does GBML with DNNs have global convergence guarantees? We
provide a positive answer to this question by proving that GBML with
over-parameterized DNNs is guaranteed to converge to global optima at a linear
rate. The second question we aim to address is: How does GBML achieve fast
adaption to new tasks with prior experience on past tasks? To answer it, we
theoretically show that GBML is equivalent to a functional gradient descent
operation that explicitly propagates experience from the past tasks to new
ones, and then we prove a generalization error bound of GBML with
over-parameterized DNNs.Comment: Under review. Code available at
https://github.com/AI-secure/Meta-Neural-Kerne
On the Global Optimality of Model-Agnostic Meta-Learning
Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel
optimization problem, where the inner level solves each subtask based on a
shared prior, while the outer level searches for the optimal shared prior by
optimizing its aggregated performance over all the subtasks. Despite its
empirical success, MAML remains less understood in theory, especially in terms
of its global optimality, due to the nonconvexity of the meta-objective (the
outer-level objective). To bridge such a gap between theory and practice, we
characterize the optimality gap of the stationary points attained by MAML for
both reinforcement learning and supervised learning, where the inner-level and
outer-level problems are solved via first-order optimization methods. In
particular, our characterization connects the optimality gap of such stationary
points with (i) the functional geometry of inner-level objectives and (ii) the
representation power of function approximators, including linear models and
neural networks. To the best of our knowledge, our analysis establishes the
global optimality of MAML with nonconvex meta-objectives for the first time.Comment: 41 pages; accepted to ICML; initial draft submitted in Feb, 202
Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation
Multi-task learning (MTL) aims to improve the generalization of several
related tasks by learning them jointly. As a comparison, in addition to the
joint training scheme, modern meta-learning allows unseen tasks with limited
labels during the test phase, in the hope of fast adaptation over them. Despite
the subtle difference between MTL and meta-learning in the problem formulation,
both learning paradigms share the same insight that the shared structure
between existing training tasks could lead to better generalization and
adaptation. In this paper, we take one important step further to understand the
close connection between these two learning paradigms, through both theoretical
analysis and empirical investigation. Theoretically, we first demonstrate that
MTL shares the same optimization formulation with a class of gradient-based
meta-learning (GBML) algorithms. We then prove that for over-parameterized
neural networks with sufficient depth, the learned predictive functions of MTL
and GBML are close. In particular, this result implies that the predictions
given by these two models are similar over the same unseen task. Empirically,
we corroborate our theoretical findings by showing that, with proper
implementation, MTL is competitive against state-of-the-art GBML algorithms on
a set of few-shot image classification benchmarks. Since existing GBML
algorithms often involve costly second-order bi-level optimization, our
first-order MTL method is an order of magnitude faster on large-scale datasets
such as mini-ImageNet. We believe this work could help bridge the gap between
these two learning paradigms, and provide a computationally efficient
alternative to GBML that also supports fast task adaptation.Comment: ICML 2021 camera-ready version. Code is released at
https://github.com/AI-secure/multi-task-learnin
Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning
As a popular meta-learning approach, the model-agnostic meta-learning (MAML)
algorithm has been widely used due to its simplicity and effectiveness.
However, the convergence of the general multi-step MAML still remains
unexplored. In this paper, we develop a new theoretical framework to provide
such convergence guarantee for two types of objective functions that are of
interest in practice: (a) resampling case (e.g., reinforcement learning), where
loss functions take the form in expectation and new data are sampled as the
algorithm runs; and (b) finite-sum case (e.g., supervised learning), where loss
functions take the finite-sum form with given samples. For both cases, we
characterize the convergence rate and the computational complexity to attain an
-accurate solution for multi-step MAML in the general nonconvex
setting. In particular, our results suggest that an inner-stage stepsize needs
to be chosen inversely proportional to the number of inner-stage steps in
order for -step MAML to have guaranteed convergence. From the technical
perspective, we develop novel techniques to deal with the nested structure of
the meta gradient for multi-step MAML, which can be of independent interest.Comment: 40 page