Search CORE

105 research outputs found

L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models

Author: Gaur Manas
Roy Kaushik
Sheth Amit
Shiri Aidin
Publication venue
Publication date: 11/11/2023
Field of study

Fine-tuning pre-trained foundational language models (FLM) for specific tasks is often impractical, especially for resource-constrained devices. This necessitates the development of a Lifelong Learning (L3) framework that continuously adapts to a stream of Natural Language Processing (NLP) tasks efficiently. We propose an approach that focuses on extracting meaningful representations from unseen data, constructing a structured knowledge base, and improving task performance incrementally. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. We measured good performance across the accuracy, training efficiency, and knowledge transfer metrics. Initial experimental results show that the proposed L3 ensemble method increases the model accuracy by 4% ~ 36% compared to the fine-tuned FLM. Furthermore, L3 model outperforms naive fine-tuning approaches while maintaining competitive or superior performance (up to 15.4% increase in accuracy) compared to the state-of-the-art language model (T5) for the given task, STS benchmark

arXiv.org e-Print Archive

Catastrophic forgetting: still a problem for DNNs

Author: Abdullah S.
Gepperth A.
Kilian A.
Pfülb B.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2019
Field of study

We investigate the performance of DNNs when trained on class-incremental visual problems consisting of initial training, followed by retraining with added visual classes. Catastrophic forgetting (CF) behavior is measured using a new evaluation procedure that aims at an application-oriented view of incremental learning. In particular, it imposes that model selection must be performed on the initial dataset alone, as well as demanding that retraining control be performed only using the retraining dataset, as initial dataset is usually too large to be kept. Experiments are conducted on class-incremental problems derived from MNIST, using a variety of different DNN models, some of them recently proposed to avoid catastrophic forgetting. When comparing our new evaluation procedure to previous approaches for assessing CF, we find their findings are completely negated, and that none of the tested methods can avoid CF in all experiments. This stresses the importance of a realistic empirical measurement procedure for catastrophic forgetting, and the need for further research in incremental learning for DNNs.Comment: 10 pages, 11 figures, Artificial Neural Networks and Machine Learning - ICANN 201

arXiv.org e-Print Archive

Crossref