Search CORE

2 research outputs found

Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

Author: Chiruzzo Luis
Coto-Solano Rolando
Ebrahimi Abteen
Giménez-Lugo Gustavo A.
Kann Katharina
McCarthy Arya D.
Oncevay Arturo
Ortega John E.
Publication venue
Publication date: 15/02/2023
Field of study

Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri--Spanish, Guarani--Spanish, Quechua--Spanish, and Shipibo-Konibo--Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.Comment: EACL 202

arXiv.org e-Print Archive