Search CORE

2 research outputs found

A Resource for Computational Experiments on Mapudungun

Author: Anastasopoulos Antonios
Black Alan W
Duan Mingjun
Fasola Carlos
Levin Lori
Rallabandi Sai Krishna
Vega Rodolfo M.
Publication venue
Publication date: 04/04/2020
Field of study

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-standard pronunciations. We also provide baseline results on three core NLP tasks: speech recognition, speech synthesis, and machine translation between Spanish and Mapudungun. We further explore other applications for which the corpus will be suitable, including the study of code-switching, historical orthography change, linguistic structure, and sociological and anthropological studies.Comment: accepted at LREC 202

arXiv.org e-Print Archive

Predicting Performance for Natural Language Processing Tasks

Author: Anastasopoulos Antonios
Neubig Graham
Xia Mengzhou
Xu Ruochen
Yang Yiming
Publication venue
Publication date: 02/05/2020
Field of study

Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting. In this work, we attempt to explore the possibility of gaining plausible judgments of how well an NLP model can perform under an experimental setting, without actually training or testing the model. To do so, we build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input. Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures, outperforming reasonable baselines as well as human experts. Going further, we outline how our predictor can be used to find a small subset of representative experiments that should be run in order to obtain plausible predictions for all other experimental settings.Comment: Accepted at ACL'2

arXiv.org e-Print Archive