In cross-lingual language understanding, machine translation is often
utilized to enhance the transferability of models across languages, either by
translating the training data from the source language to the target, or from
the target to the source to aid inference. However, in cross-lingual machine
reading comprehension (MRC), it is difficult to perform a deep level of
assistance to enhance cross-lingual transfer because of the variation of answer
span positions in different languages. In this paper, we propose X-STA, a new
approach for cross-lingual MRC. Specifically, we leverage an attentive teacher
to subtly transfer the answer spans of the source language to the answer output
space of the target. A Gradient-Disentangled Knowledge Sharing technique is
proposed as an improved cross-attention block. In addition, we force the model
to learn semantic alignments from multiple granularities and calibrate the
model outputs with teacher guidance to enhance cross-lingual transferability.
Experiments on three multi-lingual MRC datasets show the effectiveness of our
method, outperforming state-of-the-art approaches.Comment: emnlp 202