Search CORE

6 research outputs found

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Author: Huang Xunpeng
Li Lei
Liu Zhengyang
Wang Zhe
Xu Runxin
Zhou Hao
Publication venue
Publication date: 12/06/2020
Field of study

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Author: Lee Katherine
Li Wei
Liu Peter J.
Matena Michael
Narang Sharan
Raffel Colin
Roberts Adam
Shazeer Noam
Zhou Yanqi
Publication venue
Publication date: 28/07/2020
Field of study

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.Comment: Final version as published in JML

arXiv.org e-Print Archive