Pretrained multilingual contextual representations have shown great success,
but due to the limits of their pretraining data, their benefits do not apply
equally to all language varieties. This presents a challenge for language
varieties unfamiliar to these models, whose labeled \emph{and unlabeled} data
is too limited to train a monolingual model effectively. We propose the use of
additional language-specific pretraining and vocabulary augmentation to adapt
multilingual models to low-resource settings. Using dependency parsing of four
diverse low-resource language varieties as a case study, we show that these
methods significantly improve performance over baselines, especially in the
lowest-resource cases, and demonstrate the importance of the relationship
between such models' pretraining data and target language varieties.Comment: In Findings of EMNLP 202