43 research outputs found
On Effectively Learning of Knowledge in Continual Pre-training
Pre-trained language models (PLMs) like BERT have made significant progress
in various downstream NLP tasks. However, by asking models to do cloze-style
tests, recent work finds that PLMs are short in acquiring knowledge from
unstructured text. To understand the internal behaviour of PLMs in retrieving
knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free
(K-F) tokens for unstructured text and ask professional annotators to label
some samples manually. Then, we find that PLMs are more likely to give wrong
predictions on K-B tokens and attend less attention to those tokens inside the
self-attention module. Based on these observations, we develop two solutions to
help the model learn more knowledge from unstructured text in a fully
self-supervised manner. Experiments on knowledge-intensive tasks show the
effectiveness of the proposed methods. To our best knowledge, we are the first
to explore fully self-supervised learning of knowledge in continual
pre-training
ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization
Due to its simplicity and outstanding ability to generalize, stochastic
gradient descent (SGD) is still the most widely used optimization method
despite its slow convergence. Meanwhile, adaptive methods have attracted rising
attention of optimization and machine learning communities, both for the
leverage of life-long information and for the profound and fundamental
mathematical theory. Taking the best of both worlds is the most exciting and
challenging question in the field of optimization for machine learning. Along
this line, we revisited existing adaptive gradient methods from a novel
perspective, refreshing understanding of second moments. Our new perspective
empowers us to attach the properties of second moments to the first moment
iteration, and to propose a novel first moment optimizer,
\emph{Angle-Calibrated Moment method} (\method). Our theoretical results show
that \method is able to achieve the same convergence rate as mainstream
adaptive methods. Furthermore, extensive experiments on CV and NLP tasks
demonstrate that \method has a comparable convergence to SOTA Adam-type
optimizers, and gains a better generalization performance in most cases.Comment: 25 pages, 4 figure
A Meta-Learning Method for Concept Drift
The knowledge hidden in evolving data may change with time, this issue is known as concept drift. It often causes a learning system to decrease its prediction accuracy. Most existing techniques apply ensemble methods to improve learning performance on concept drift. In this paper, we propose a novel meta learning approach for this issue and develop a method: Multi-Step Learning (MSL). In our method, a MSL learner is structured in a recursive manner, which contains all the base learners maintained in a hierarchy, ensuring the learned concepts are traceable. We evaluated MSL and two ensemble techniques on three synthetic datasets, which contain a number of drastic concept drifts. The experimental results show that the proposed method generally performs better than the ensemble techniques in terms of prediction accuracy