10 research outputs found
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue
Modern virtual assistants use internal semantic parsing engines to convert
user utterances to actionable commands. However, prior work has demonstrated
that semantic parsing is a difficult multilingual transfer task with low
transfer efficiency compared to other tasks. In global markets such as India
and Latin America, this is a critical issue as switching between languages is
prevalent for bilingual users. In this work we dramatically improve the
zero-shot performance of a multilingual and codeswitched semantic parsing
system using two stages of multilingual alignment. First, we show that
constrastive alignment pretraining improves both English performance and
transfer efficiency. We then introduce a constrained optimization approach for
hyperparameter-free adversarial alignment during finetuning. Our Doubly Aligned
Multilingual Parser (DAMP) improves mBERT transfer performance by 3x, 6x, and
81x on the Spanglish, Hinglish and Multilingual Task Oriented Parsing
benchmarks respectively and outperforms XLM-R and mT5-Large using 3.2x fewer
parameters.Comment: 9 Pages; ACL Main Conference 202
Lexical Normalization for Code-switched Data and its Effect on POS Tagging
Lexical normalization, the translation of non-canonical data to standard
language, has shown to improve the performance of manynatural language
processing tasks on social media. Yet, using multiple languages in one
utterance, also called code-switching (CS), is frequently overlooked by these
normalization systems, despite its common use in social media. In this paper,
we propose three normalization models specifically designed to handle
code-switched data which we evaluate for two language pairs: Indonesian-English
(Id-En) and Turkish-German (Tr-De). For the latter, we introduce novel
normalization layers and their corresponding language ID and POS tags for the
dataset, and evaluate the downstream effect of normalization on POS tagging.
Results show that our CS-tailored normalization models outperform Id-En state
of the art and Tr-De monolingual models, and lead to 5.4% relative performance
increase for POS tagging as compared to unnormalized input
Dialect-robust Evaluation of Generated Text
Evaluation metrics that are not robust to dialect variation make it
impossible to tell how well systems perform for many groups of users, and can
even penalize systems for producing text in lower-resource dialects. However,
currently, there exists no way to quantify how metrics respond to change in the
dialect of a generated utterance. We thus formalize dialect robustness and
dialect awareness as goals for NLG evaluation metrics. We introduce a suite of
methods and corresponding statistical tests one can use to assess metrics in
light of the two goals. Applying the suite to current state-of-the-art metrics,
we demonstrate that they are not dialect-robust and that semantic perturbations
frequently lead to smaller decreases in a metric than the introduction of
dialect features. As a first step to overcome this limitation, we propose a
training schema, NANO, which introduces regional and language information to
the pretraining process of a metric. We demonstrate that NANO provides a
size-efficient way for models to improve the dialect robustness while
simultaneously improving their performance on the standard metric benchmark