13,949 research outputs found
Recommended from our members
Effective and Efficient Transfer Learning in the Era of Large Language Models
Substantial progress has been made in the field of natural language processing (NLP) due to the advent of large language models (LLMs)—deep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data. However, these models have common weaknesses, including degenerate performance in data-scarce scenarios, and substantial computational resource requirements. This thesis aims to develop methods to address these limitations for improved applicability and performance of LLMs in resource-constrained settings with limited data and/or computational resources.
To address the need for labeled data in data-scarce scenarios, I present two methods, in Chapter 2 and Chapter 3, respectively. The first method leverages beneficial relationships between NLP tasks for transfer learning, while the second method combines data augmentation and self-training to boost few-shot learning performance—the ability to perform novel tasks from only a few labeled examples. Additionally, in Chapter 4, I introduce a novel parameter-efficient transfer learning approach that reuses a single frozen model for all tasks while only learning minimal task-specific parameters (soft/continuous prompts) to represent tasks and transfer knowledge. Our method can match or outperform fine-tuning task-specific models (training the whole model on each task). In Chapter 5, I demonstrate the benefits of parameter-efficient transfer learning in a cross-lingual transfer setting. Finally, I conclude the thesis in Chapter 6 by outlining potential avenues for future research that aim to advance NLP through large-scale multi-task learning using multilingual and multimodal data
Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights
Adapters, a plug-in neural network module with some tunable parameters, have
emerged as a parameter-efficient transfer learning technique for adapting
pre-trained models to downstream tasks, especially for natural language
processing (NLP) and computer vision (CV) fields. Meanwhile, learning
recommendation models directly from raw item modality features -- e.g., texts
of NLP and images of CV -- can enable effective and transferable recommender
systems (called TransRec). In view of this, a natural question arises: can
adapter-based learning techniques achieve parameter-efficient TransRec with
good performance?
To this end, we perform empirical studies to address several key
sub-questions. First, we ask whether the adapter-based TransRec performs
comparably to TransRec based on standard full-parameter fine-tuning? does it
hold for recommendation with different item modalities, e.g., textual RS and
visual RS. If yes, we benchmark these existing adapters, which have been shown
to be effective in NLP and CV tasks, in the item recommendation settings.
Third, we carefully study several key factors for the adapter-based TransRec in
terms of where and how to insert these adapters? Finally, we look at the
effects of adapter-based TransRec by either scaling up its source training data
or scaling down its target training data. Our paper provides key insights and
practical guidance on unified & transferable recommendation -- a less studied
recommendation scenario. We promise to release all code & datasets for future
research
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
We introduce Adapters, an open-source library that unifies
parameter-efficient and modular transfer learning in large language models. By
integrating 10 diverse adapter methods into a unified interface, Adapters
offers ease of use and flexible configuration. Our library allows researchers
and practitioners to leverage adapter modularity through composition blocks,
enabling the design of complex adapter setups. We demonstrate the library's
efficacy by evaluating its performance against full fine-tuning on various NLP
tasks. Adapters provides a powerful tool for addressing the challenges of
conventional fine-tuning paradigms and promoting more efficient and modular
transfer learning. The library is available via https://adapterhub.ml/adapters.Comment: EMNLP 2023: Systems Demonstration
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Transfer learning has fundamentally changed the landscape of natural language
processing (NLP) research. Many existing state-of-the-art models are first
pre-trained on a large text corpus and then fine-tuned on downstream tasks.
However, due to limited data resources from downstream tasks and the extremely
large capacity of pre-trained models, aggressive fine-tuning often causes the
adapted model to overfit the data of downstream tasks and forget the knowledge
of the pre-trained model. To address the above issue in a more principled
manner, we propose a new computational framework for robust and efficient
fine-tuning for pre-trained language models. Specifically, our proposed
framework contains two important ingredients: 1. Smoothness-inducing
regularization, which effectively manages the capacity of the model; 2. Bregman
proximal point optimization, which is a class of trust-region methods and can
prevent knowledge forgetting. Our experiments demonstrate that our proposed
method achieves the state-of-the-art performance on multiple NLP benchmarks.Comment: The 58th annual meeting of the Association for Computational
Linguistics (ACL 2020
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
Finding Answers from the Word of God: Domain Adaptation for Neural Networks in Biblical Question Answering
Question answering (QA) has significantly benefitted from deep learning
techniques in recent years. However, domain-specific QA remains a challenge due
to the significant amount of data required to train a neural network. This
paper studies the answer sentence selection task in the Bible domain and answer
questions by selecting relevant verses from the Bible. For this purpose, we
create a new dataset BibleQA based on bible trivia questions and propose three
neural network models for our task. We pre-train our models on a large-scale QA
dataset, SQuAD, and investigate the effect of transferring weights on model
accuracy. Furthermore, we also measure the model accuracies with different
answer context lengths and different Bible translations. We affirm that
transfer learning has a noticeable improvement in the model accuracy. We
achieve relatively good results with shorter context lengths, whereas longer
context lengths decreased model accuracy. We also find that using a more modern
Bible translation in the dataset has a positive effect on the task.Comment: The paper has been accepted at IJCNN 201
- …