1 research outputs found
Identifying Necessary Elements for BERT's Multilinguality
It has been shown that multilingual BERT (mBERT) yields high quality
multilingual representations and enables effective zero-shot transfer. This is
surprising given that mBERT does not use any crosslingual signal during
training. While recent literature has studied this phenomenon, the reasons for
the multilinguality are still somewhat obscure. We aim to identify
architectural properties of BERT and linguistic properties of languages that
are necessary for BERT to become multilingual. To allow for fast
experimentation we propose an efficient setup with small BERT models trained on
a mix of synthetic and natural data. Overall, we identify four architectural
and two linguistic elements that influence multilinguality. Based on our
insights, we experiment with a multilingual pretraining setup that modifies the
masking strategy using VecMap, i.e., unsupervised embedding alignment.
Experiments on XNLI with three languages indicate that our findings transfer
from our small setup to larger scale settings.Comment: EMNLP2020 CR