Search CORE

10 research outputs found

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

Author: Goel Rahul
Held William
Hidey Christopher
Liu Fei
Shah Rushin
Yang Diyi
Zhu Eric
Publication venue
Publication date: 26/05/2023
Field of study

Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands. However, prior work has demonstrated that semantic parsing is a difficult multilingual transfer task with low transfer efficiency compared to other tasks. In global markets such as India and Latin America, this is a critical issue as switching between languages is prevalent for bilingual users. In this work we dramatically improve the zero-shot performance of a multilingual and codeswitched semantic parsing system using two stages of multilingual alignment. First, we show that constrastive alignment pretraining improves both English performance and transfer efficiency. We then introduce a constrained optimization approach for hyperparameter-free adversarial alignment during finetuning. Our Doubly Aligned Multilingual Parser (DAMP) improves mBERT transfer performance by 3x, 6x, and 81x on the Spanglish, Hinglish and Multilingual Task Oriented Parsing benchmarks respectively and outperforms XLM-R and mT5-Large using 3.2x fewer parameters.Comment: 9 Pages; ACL Main Conference 202

arXiv.org e-Print Archive

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

Author: van der Goot Rob
Çetinoğlu Özlem
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 31/01/2021
Field of study

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media. Yet, using multiple languages in one utterance, also called code-switching (CS), is frequently overlooked by these normalization systems, despite its common use in social media. In this paper, we propose three normalization models specifically designed to handle code-switched data which we evaluate for two language pairs: Indonesian-English (Id-En) and Turkish-German (Tr-De). For the latter, we introduce novel normalization layers and their corresponding language ID and POS tags for the dataset, and evaluate the downstream effect of normalization on POS tagging. Results show that our CS-tailored normalization models outperform Id-En state of the art and Tr-De monolingual models, and lead to 5.4% relative performance increase for POS tagging as compared to unnormalized input

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository

Dialect-robust Evaluation of Generated Text

Author: Clark Elizabeth
Dozat Timothy
Eisenstein Jacob
Garrette Dan
Gehrmann Sebastian
Sellam Thibault
Siddhant Aditya
Sun Jiao
Vu Tu
Publication venue
Publication date: 02/11/2022
Field of study

Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as goals for NLG evaluation metrics. We introduce a suite of methods and corresponding statistical tests one can use to assess metrics in light of the two goals. Applying the suite to current state-of-the-art metrics, we demonstrate that they are not dialect-robust and that semantic perturbations frequently lead to smaller decreases in a metric than the introduction of dialect features. As a first step to overcome this limitation, we propose a training schema, NANO, which introduces regional and language information to the pretraining process of a metric. We demonstrate that NANO provides a size-efficient way for models to improve the dialect robustness while simultaneously improving their performance on the standard metric benchmark

arXiv.org e-Print Archive

Socio-technical HCI for Ethical Value Exchange: A Case of Service Design and Innovation ‘at the Margins’ in Resource Constrained Environments

Author: Abdelnour-Nocera José
Christensen Lars Rune
Clemmensen Torkil
Nielsen Lene
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

The IT University of Copenhagen's Repository

Proceedings of the 13th Linguistic Annotation Workshop, August 1, 2019, Florence, Italy

Author: Friedrich Annemarie
Hoek Jet
Zeyrek Deniz
Publication venue
Publication date: 07/07/2023
Field of study

OPUS Augsburg