105 research outputs found
L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models
Fine-tuning pre-trained foundational language models (FLM) for specific tasks
is often impractical, especially for resource-constrained devices. This
necessitates the development of a Lifelong Learning (L3) framework that
continuously adapts to a stream of Natural Language Processing (NLP) tasks
efficiently. We propose an approach that focuses on extracting meaningful
representations from unseen data, constructing a structured knowledge base, and
improving task performance incrementally. We conducted experiments on various
NLP tasks to validate its effectiveness, including benchmarks like GLUE and
SuperGLUE. We measured good performance across the accuracy, training
efficiency, and knowledge transfer metrics. Initial experimental results show
that the proposed L3 ensemble method increases the model accuracy by 4% ~ 36%
compared to the fine-tuned FLM. Furthermore, L3 model outperforms naive
fine-tuning approaches while maintaining competitive or superior performance
(up to 15.4% increase in accuracy) compared to the state-of-the-art language
model (T5) for the given task, STS benchmark
Catastrophic forgetting: still a problem for DNNs
We investigate the performance of DNNs when trained on class-incremental
visual problems consisting of initial training, followed by retraining with
added visual classes. Catastrophic forgetting (CF) behavior is measured using a
new evaluation procedure that aims at an application-oriented view of
incremental learning. In particular, it imposes that model selection must be
performed on the initial dataset alone, as well as demanding that retraining
control be performed only using the retraining dataset, as initial dataset is
usually too large to be kept. Experiments are conducted on class-incremental
problems derived from MNIST, using a variety of different DNN models, some of
them recently proposed to avoid catastrophic forgetting. When comparing our new
evaluation procedure to previous approaches for assessing CF, we find their
findings are completely negated, and that none of the tested methods can avoid
CF in all experiments. This stresses the importance of a realistic empirical
measurement procedure for catastrophic forgetting, and the need for further
research in incremental learning for DNNs.Comment: 10 pages, 11 figures, Artificial Neural Networks and Machine Learning
- ICANN 201
- …