18 research outputs found
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Multi-Task Learning (MTL) networks have emerged as a promising method for
transferring learned knowledge across different tasks. However, MTL must deal
with challenges such as: overfitting to low resource tasks, catastrophic
forgetting, and negative task transfer, or learning interference. Often, in
Natural Language Processing (NLP), a separate model per task is needed to
obtain the best performance. However, many fine-tuning approaches are both
parameter inefficient, i.e., potentially involving one new model per task, and
highly susceptible to losing knowledge acquired during pretraining. We propose
a novel Transformer architecture consisting of a new conditional attention
mechanism as well as a set of task-conditioned modules that facilitate weight
sharing. Through this construction, we achieve more efficient parameter sharing
and mitigate forgetting by keeping half of the weights of a pretrained model
fixed. We also use a new multi-task data sampling strategy to mitigate the
negative effects of data imbalance across tasks. Using this approach, we are
able to surpass single task fine-tuning methods while being parameter and data
efficient (using around 66% of the data for weight updates). Compared to other
BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by
2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and
single task fine-tuning. We show that a larger variant of our single multi-task
model approach performs competitively across 26 NLP tasks and yields
state-of-the-art results on a number of test and development sets. Our code is
publicly available at https://github.com/CAMTL/CA-MTL.Comment: ICLR 2021 (Reprint
New Protocols and Negative Results for Textual Entailment Data Collection
Natural language inference (NLI) data has proven useful in benchmarking and,
especially, as pretraining data for tasks requiring language understanding.
However, the crowdsourcing protocol that was used to collect this data has
known issues and was not explicitly optimized for either of these purposes, so
it is likely far from ideal. We propose four alternative protocols, each aimed
at improving either the ease with which annotators can produce sound training
examples or the quality and diversity of those examples. Using these
alternatives and a fifth baseline protocol, we collect and compare five new
8.5k-example training sets. In evaluations focused on transfer learning
applications, our results are solidly negative, with models trained on our
baseline dataset yielding good transfer performance to downstream tasks, but
none of our four new methods (nor the recent ANLI) showing any improvements
over that baseline. In a small silver lining, we observe that all four new
protocols, especially those where annotators edit pre-filled text boxes, reduce
previously observed issues with annotation artifacts.Comment: To appear at EMNLP 202