8 research outputs found
Identifying Necessary Elements for BERT’s Multilinguality
It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations and enables effective zero-shot transfer. This is suprising given that mBERT does not use any kind of crosslingual sig- nal during training. While recent literature has studied this effect, the exact reason for mBERT’s multilinguality is still unknown. We aim to identify architectural properties of BERT as well as linguistic properties of lan- guages that are necessary for BERT to become multilingual. To allow for fast experimenta- tion we propose an efficient setup with small BERT models and synthetic as well as natu- ral data. Overall, we identify six elements that are potentially necessary for BERT to be mul- tilingual. Architectural factors that contribute to multilinguality are underparameterization, shared special tokens (e.g., “[CLS]”), shared position embeddings and replacing masked to- kens with random tokens. Factors related to training data that are beneficial for multilin- guality are similar word order and comparabil- ity of corpora
Re-Evaluating GermEval17 Using German Pre-Trained Language Models
The lack of a commonly used benchmark data set (collection) such as
(Super-)GLUE (Wang et al., 2018, 2019) for the evaluation of non-English
pre-trained language models is a severe shortcoming of current English-centric
NLP-research. It concentrates a large part of the research on English,
neglecting the uncertainty when transferring conclusions found for the English
language to other languages. We evaluate the performance of the German and
multilingual BERT-based models currently available via the huggingface
transformers library on the four tasks of the GermEval17 workshop. We compare
them to pre-BERT architectures (Wojatzki et al., 2017; Schmitt et al., 2018;
Attia et al., 2018) as well as to an ELMo-based architecture (Biesialska et
al., 2020) and a BERT-based approach (Guhr et al., 2020). The observed
improvements are put in relation to those for similar tasks and similar models
(pre-BERT vs. BERT-based) for the English language in order to draw tentative
conclusions about whether the observed improvements are transferable to German
or potentially other related languages.Comment: Accepted as a conference paper at the 6th Swiss Text Analytics
Conference (SwissText), Brugg, Switzerland (Online), June 14-16, 202
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Pretrained multilingual contextual representations have shown great success,
but due to the limits of their pretraining data, their benefits do not apply
equally to all language varieties. This presents a challenge for language
varieties unfamiliar to these models, whose labeled \emph{and unlabeled} data
is too limited to train a monolingual model effectively. We propose the use of
additional language-specific pretraining and vocabulary augmentation to adapt
multilingual models to low-resource settings. Using dependency parsing of four
diverse low-resource language varieties as a case study, we show that these
methods significantly improve performance over baselines, especially in the
lowest-resource cases, and demonstrate the importance of the relationship
between such models' pretraining data and target language varieties.Comment: In Findings of EMNLP 202