2 research outputs found
Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models
Large multilingual models have inspired a new class of word alignment
methods, which work well for the model's pretraining languages. However, the
languages most in need of automatic alignment are low-resource and, thus, not
typically included in the pretraining data. In this work, we ask: How do modern
aligners perform on unseen languages, and are they better than traditional
methods? We contribute gold-standard alignments for Bribri--Spanish,
Guarani--Spanish, Quechua--Spanish, and Shipibo-Konibo--Spanish. With these, we
evaluate state-of-the-art aligners with and without model adaptation to the
target language. Finally, we also evaluate the resulting alignments
extrinsically through two downstream tasks: named entity recognition and
part-of-speech tagging. We find that although transformer-based methods
generally outperform traditional models, the two classes of approach remain
competitive with each other.Comment: EACL 202