Search CORE

4 research outputs found

Global Convergence and Generalization Bound of Gradient-Based Meta-Learning with Deep Neural Nets

Author: Li Bo
Sun Ruoyu
Wang Haoxiang
Publication venue
Publication date: 16/11/2020
Field of study

Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a popular approach for few-shot learning. However, due to the non-convexity of DNNs and the bi-level optimization in GBML, the theoretical properties of GBML with DNNs remain largely unknown. In this paper, we first aim to answer the following question: Does GBML with DNNs have global convergence guarantees? We provide a positive answer to this question by proving that GBML with over-parameterized DNNs is guaranteed to converge to global optima at a linear rate. The second question we aim to address is: How does GBML achieve fast adaption to new tasks with prior experience on past tasks? To answer it, we theoretically show that GBML is equivalent to a functional gradient descent operation that explicitly propagates experience from the past tasks to new ones, and then we prove a generalization error bound of GBML with over-parameterized DNNs.Comment: Under review. Code available at https://github.com/AI-secure/Meta-Neural-Kerne

arXiv.org e-Print Archive

On the Global Optimality of Model-Agnostic Meta-Learning

Author: Cai Qi
Wang Lingxiao
Wang Zhaoran
Yang Zhuoran
Publication venue
Publication date: 23/06/2020
Field of study

Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior, while the outer level searches for the optimal shared prior by optimizing its aggregated performance over all the subtasks. Despite its empirical success, MAML remains less understood in theory, especially in terms of its global optimality, due to the nonconvexity of the meta-objective (the outer-level objective). To bridge such a gap between theory and practice, we characterize the optimality gap of the stationary points attained by MAML for both reinforcement learning and supervised learning, where the inner-level and outer-level problems are solved via first-order optimization methods. In particular, our characterization connects the optimality gap of such stationary points with (i) the functional geometry of inner-level objectives and (ii) the representation power of function approximators, including linear models and neural networks. To the best of our knowledge, our analysis establishes the global optimality of MAML with nonconvex meta-objectives for the first time.Comment: 41 pages; accepted to ICML; initial draft submitted in Feb, 202

arXiv.org e-Print Archive

Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Author: Li Bo
Wang Haoxiang
Zhao Han
Publication venue
Publication date: 16/06/2021
Field of study

Multi-task learning (MTL) aims to improve the generalization of several related tasks by learning them jointly. As a comparison, in addition to the joint training scheme, modern meta-learning allows unseen tasks with limited labels during the test phase, in the hope of fast adaptation over them. Despite the subtle difference between MTL and meta-learning in the problem formulation, both learning paradigms share the same insight that the shared structure between existing training tasks could lead to better generalization and adaptation. In this paper, we take one important step further to understand the close connection between these two learning paradigms, through both theoretical analysis and empirical investigation. Theoretically, we first demonstrate that MTL shares the same optimization formulation with a class of gradient-based meta-learning (GBML) algorithms. We then prove that for over-parameterized neural networks with sufficient depth, the learned predictive functions of MTL and GBML are close. In particular, this result implies that the predictions given by these two models are similar over the same unseen task. Empirically, we corroborate our theoretical findings by showing that, with proper implementation, MTL is competitive against state-of-the-art GBML algorithms on a set of few-shot image classification benchmarks. Since existing GBML algorithms often involve costly second-order bi-level optimization, our first-order MTL method is an order of magnitude faster on large-scale datasets such as mini-ImageNet. We believe this work could help bridge the gap between these two learning paradigms, and provide a computationally efficient alternative to GBML that also supports fast task adaptation.Comment: ICML 2021 camera-ready version. Code is released at https://github.com/AI-secure/multi-task-learnin

arXiv.org e-Print Archive

Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

Author: Ji Kaiyi
Liang Yingbin
Yang Junjie
Publication venue
Publication date: 13/07/2020
Field of study

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework to provide such convergence guarantee for two types of objective functions that are of interest in practice: (a) resampling case (e.g., reinforcement learning), where loss functions take the form in expectation and new data are sampled as the algorithm runs; and (b) finite-sum case (e.g., supervised learning), where loss functions take the finite-sum form with given samples. For both cases, we characterize the convergence rate and the computational complexity to attain an

\epsilon

-accurate solution for multi-step MAML in the general nonconvex setting. In particular, our results suggest that an inner-stage stepsize needs to be chosen inversely proportional to the number

N

of inner-stage steps in order for

N

-step MAML to have guaranteed convergence. From the technical perspective, we develop novel techniques to deal with the nested structure of the meta gradient for multi-step MAML, which can be of independent interest.Comment: 40 page

arXiv.org e-Print Archive