Search CORE

22 research outputs found

On Effectively Learning of Knowledge in Continual Pre-training

Author: Huang Fei
Li Yanyang
Luo Fuli
Wang Cunxiang
Xu Runxin
Zhang Yue
Publication venue
Publication date: 17/04/2022
Field of study

Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, by asking models to do cloze-style tests, recent work finds that PLMs are short in acquiring knowledge from unstructured text. To understand the internal behaviour of PLMs in retrieving knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free (K-F) tokens for unstructured text and ask professional annotators to label some samples manually. Then, we find that PLMs are more likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. Experiments on knowledge-intensive tasks show the effectiveness of the proposed methods. To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training

arXiv.org e-Print Archive

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Author: Huang Xunpeng
Li Lei
Liu Zhengyang
Wang Zhe
Xu Runxin
Zhou Hao
Publication venue
Publication date: 12/06/2020
Field of study

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

Author: Cao Yunbo
Dai Damai
Lin Binghuai
Liu Tianyu
Sui Zhifang
Tong Shoujie
Xia Heming
Xu Runxin
Publication venue
Publication date: 22/10/2023
Field of study

Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with equal contribution

arXiv.org e-Print Archive

A multiple criteria service composition selection algorithm supporting time-sensitive rules

Author: Jennings Brendan
Shi Lei
Wang Runxin
Xu Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Constructing composite services by using of services offered by third parties is an attractive and inexpensive way for service brokers and aggregators to enhance differentiation from their competitors. When multiple services provide the same or similar functionalities, selecting those that satisfy users' non-functional requirements is crucial. In many cases, non-functional properties of services are heavily dependent on the activity of the network delivering those services whilst the network activity follows certain time-sensitive rules. We present a service selection algorithm that takes into account time-sensitive variations of non-functional propensities of services to identify a service combination offering the highest quality within a specified time interval

Crossref

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line