Search CORE

13,949 research outputs found

Recommended from our members

Effective and Efficient Transfer Learning in the Era of Large Language Models

Author: Vu Tu
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2023
Field of study

Substantial progress has been made in the field of natural language processing (NLP) due to the advent of large language models (LLMs)—deep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data. However, these models have common weaknesses, including degenerate performance in data-scarce scenarios, and substantial computational resource requirements. This thesis aims to develop methods to address these limitations for improved applicability and performance of LLMs in resource-constrained settings with limited data and/or computational resources. To address the need for labeled data in data-scarce scenarios, I present two methods, in Chapter 2 and Chapter 3, respectively. The first method leverages beneficial relationships between NLP tasks for transfer learning, while the second method combines data augmentation and self-training to boost few-shot learning performance—the ability to perform novel tasks from only a few labeled examples. Additionally, in Chapter 4, I introduce a novel parameter-efficient transfer learning approach that reuses a single frozen model for all tasks while only learning minimal task-specific parameters (soft/continuous prompts) to represent tasks and transfer knowledge. Our method can match or outperform fine-tuning task-specific models (training the whole model on each task). In Chapter 5, I demonstrate the benefits of parameter-efficient transfer learning in a cross-lingual transfer setting. Finally, I conclude the thesis in Chapter 6 by outlining potential avenues for future research that aim to advance NLP through large-scale multi-task learning using multilingual and multimodal data

ScholarWorks@UMass Amherst

Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights

Author: Cheng Mingyue
Cheng Shenghui
Fu Junchen
Pan Yunzhu
Song Yu
Wang Jie
Yuan Fajie
Yuan Zheng
Zhang Jiaqi
Publication venue
Publication date: 24/05/2023
Field of study

Adapters, a plug-in neural network module with some tunable parameters, have emerged as a parameter-efficient transfer learning technique for adapting pre-trained models to downstream tasks, especially for natural language processing (NLP) and computer vision (CV) fields. Meanwhile, learning recommendation models directly from raw item modality features -- e.g., texts of NLP and images of CV -- can enable effective and transferable recommender systems (called TransRec). In view of this, a natural question arises: can adapter-based learning techniques achieve parameter-efficient TransRec with good performance? To this end, we perform empirical studies to address several key sub-questions. First, we ask whether the adapter-based TransRec performs comparably to TransRec based on standard full-parameter fine-tuning? does it hold for recommendation with different item modalities, e.g., textual RS and visual RS. If yes, we benchmark these existing adapters, which have been shown to be effective in NLP and CV tasks, in the item recommendation settings. Third, we carefully study several key factors for the adapter-based TransRec in terms of where and how to insert these adapters? Finally, we look at the effects of adapter-based TransRec by either scaling up its source training data or scaling down its target training data. Our paper provides key insights and practical guidance on unified & transferable recommendation -- a less studied recommendation scenario. We promise to release all code & datasets for future research

arXiv.org e-Print Archive

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Author: Engländer Leon
Gurevych Iryna
Imhof Timo
Paul Indraneil
Pfeiffer Jonas
Poth Clifton
Purkayastha Sukannya
Ruder Sebastian
Sterz Hannah
Vulić Ivan
Publication venue
Publication date: 18/11/2023
Field of study

We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex adapter setups. We demonstrate the library's efficacy by evaluating its performance against full fine-tuning on various NLP tasks. Adapters provides a powerful tool for addressing the challenges of conventional fine-tuning paradigms and promoting more efficient and modular transfer learning. The library is available via https://adapterhub.ml/adapters.Comment: EMNLP 2023: Systems Demonstration

arXiv.org e-Print Archive

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Author: Chen Weizhu
Gao Jianfeng
He Pengcheng
Jiang Haoming
Liu Xiaodong
Zhao Tuo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational Linguistics (ACL 2020

arXiv.org e-Print Archive

Crossref

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

Author: Jiang Jing
Long Guodong
Shen Tao
Zhang Chengqi
Zhou Tianyi
Publication venue
Publication date: 01/01/2019
Field of study

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace), so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Finding Answers from the Word of God: Domain Adaptation for Neural Networks in Biblical Question Answering

Author: Liu Jiamou
Zhao Helen Jiahe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/10/2018
Field of study

Question answering (QA) has significantly benefitted from deep learning techniques in recent years. However, domain-specific QA remains a challenge due to the significant amount of data required to train a neural network. This paper studies the answer sentence selection task in the Bible domain and answer questions by selecting relevant verses from the Bible. For this purpose, we create a new dataset BibleQA based on bible trivia questions and propose three neural network models for our task. We pre-train our models on a large-scale QA dataset, SQuAD, and investigate the effect of transferring weights on model accuracy. Furthermore, we also measure the model accuracies with different answer context lengths and different Bible translations. We affirm that transfer learning has a noticeable improvement in the model accuracy. We achieve relatively good results with shorter context lengths, whereas longer context lengths decreased model accuracy. We also find that using a more modern Bible translation in the dataset has a positive effect on the task.Comment: The paper has been accepted at IJCNN 201

arXiv.org e-Print Archive

Crossref