440 research outputs found
TADA: Task-Agnostic Dialect Adapters for English
Large Language Models, the dominant starting point for Natural Language
Processing (NLP) applications, fail at a higher rate for speakers of English
dialects other than Standard American English (SAE). Prior work addresses this
using task-specific data or synthetic data augmentation, both of which require
intervention for each dialect and task pair. This poses a scalability issue
that prevents the broad adoption of robust dialectal English NLP. We introduce
a simple yet effective method for task-agnostic dialect adaptation by aligning
non-SAE dialects using adapters and composing them with task-specific adapters
from SAE. Task-Agnostic Dialect Adapters (TADA) improve dialectal robustness on
4 dialectal variants of the GLUE benchmark without task-specific supervision.Comment: 5 Pages; ACL Findings Paper 202
Recommended from our members
Data Scarcity in Event Analysis and Abusive Language Detection
Lack of data is almost always the cause of the suboptimal performance of neural networks. Even though data scarce scenarios can be simulated for any task by assuming limited access to training data, we study two problem areas where data scarcity is a practical challenge: event analysis and abusive content detection} Journalists, social scientists and political scientists need to retrieve and analyze event mentions in unstructured text to compute useful statistical information to understand society. We claim that it is hard to specify information need about events using keyword-based representation and propose a Query by Example (QBE) setting for event retrieval. In the QBE setting, we assume that there are a few example sentences mentioning the event class a user is interested in and we aim to retrieve relevant events using only the examples as a query. Traditional event detection approaches are not applicable in this setting as event detection datasets are constructed based on pre-defined schemas which limits them to a small set of event and event-argument types. Moreover, the amount of annotated data in event detection datasets is limited that only allows us to build a retrieval corpus for evaluation. Thus we assume that there are no relevance judgments to train an event retrieval model -- except for the few examples of a specific event type. We create three QBE evaluation settings from three event detection datasets: PoliceKilling, ACE, and IndiaPoliceEvents. For the PoliceKilling dataset, where a relevant sentence describes a police killing event, we show that a query model constructed from the NLP features extracted from the few given examples is effective compared to event detection baselines. For the ACE dataset, where there are thirty-three types of events, we construct a QBE setting for each type and show that a sentence embedding approach effectively transfers for event matching. Finally, we conducted a unified evaluation of all three datasets using the sentence-embedding-based model and showed that it outperforms strong baselines.
We further examine the effect of data scarcity in abusive language detection. We first study a specific type of abusive language -- hate speech. Neural hate speech detection models trained from one dataset poorly generalize to another dataset from a different domain. This is because characteristics of hate speech vary based on racial and cultural aspects. Our data scarcity scenario assumes that we have a hate speech dataset from a domain and it needs to generalize to a test set from another domain using the unlabeled data from the test domain only. Thus we assume zero target domain data in this scenario. To tackle the data scarcity, we propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection. We evaluate the approach with three different models (character CNNs, BiLSTMs, and BERT) on three different collections. We show our approach improves Area under the Precision/Recall curve by as much as 42% and recall by as much as 278%, with no loss (and in some cases a significant gain) in precision.
Finally, we examine the cross-lingual abusive language detection problem. Abusive language is a superclass of hate speech that includes profanity, aggression, offensiveness, cyberbullying, toxicity, and hate speech itself. There is a large collection of abusive language detection datasets in English such as Jigsaw. For other languages there exist datasets for abusive language detection but with very limited data. We propose a cross-lingual transfer learning approach to learn an effective neural abusive language classifier for such low-resource languages with help from a dataset from a resource-rich language. The framework is based on a nearest-neighbor architecture and is thus interpretable by design. It is a modern instantiation of the classic k-nearest neighbor model, as we use transformer representations in all its components. Unlike prior work on neighborhood-based approaches, we encode the neighborhood information based on query-neighbor interactions. We propose two encoding schemes and show their effectiveness using both qualitative and quantitative analyses. Our evaluation results on eight languages from two different datasets for abusive language detection show sizable improvements in F1 over strong baselines
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Research in multilingual speech-to-text translation is topical. Having a
single model that supports multiple translation tasks is desirable. The goal of
this work it to improve cross-lingual transfer learning in multilingual
speech-to-text translation via semantic knowledge distillation. We show that by
initializing the encoder of the encoder-decoder sequence-to-sequence
translation model with SAMU-XLS-R, a multilingual speech transformer encoder
trained using multi-modal (speech-text) semantic knowledge distillation, we
achieve significantly better cross-lingual task knowledge transfer than the
baseline XLS-R, a multilingual speech transformer encoder trained via
self-supervised learning. We demonstrate the effectiveness of our approach on
two popular datasets, namely, CoVoST-2 and Europarl. On the 21 translation
tasks of the CoVoST-2 benchmark, we achieve an average improvement of 12.8 BLEU
points over the baselines. In the zero-shot translation scenario, we achieve an
average gain of 18.8 and 11.9 average BLEU points on unseen medium and
low-resource languages. We make similar observations on Europarl speech
translation benchmark
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations
Large pretrained multilingual language models (ML-LMs) have shown remarkable
capabilities of zero-shot cross-lingual transfer, without direct cross-lingual
supervision. While these results are promising, follow-up works found that,
within the multilingual embedding spaces, there exists strong language identity
information which hinders the expression of linguistic factors shared across
languages. For semantic tasks like cross-lingual sentence retrieval, it is
desired to remove such language identity signals to fully leverage semantic
information. In this work, we provide a novel view of projecting away
language-specific factors from a multilingual embedding space. Specifically, we
discover that there exists a low-rank subspace that primarily encodes
information irrelevant to semantics (e.g., syntactic information). To identify
this subspace, we present a simple but effective unsupervised method based on
singular value decomposition with multiple monolingual corpora as input. Once
the subspace is found, we can directly project the original embeddings into the
null space to boost language agnosticism without finetuning. We systematically
evaluate our method on various tasks including the challenging
language-agnostic QA retrieval task. Empirical results show that applying our
method consistently leads to improvements over commonly used ML-LMs.Comment: 17 pages, 7 figures, EMNLP 2022 (main conference
- …