7 research outputs found
MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering
We describe our two-stage system for the Multilingual Information Access
(MIA) 2022 Shared Task on Cross-Lingual Open-Retrieval Question Answering. The
first stage consists of multilingual passage retrieval with a hybrid dense and
sparse retrieval strategy. The second stage consists of a reader which outputs
the answer from the top passages returned by the first stage. We show the
efficacy of using a multilingual language model with entity representations in
pretraining, sparse retrieval signals to help dense retrieval, and
Fusion-in-Decoder. On the development set, we obtain 43.46 F1 on XOR-TyDi QA
and 21.99 F1 on MKQA, for an average F1 score of 32.73. On the test set, we
obtain 40.93 F1 on XOR-TyDi QA and 22.29 F1 on MKQA, for an average F1 score of
31.61. We improve over the official baseline by over 4 F1 points on both the
development and test sets.Comment: System description for the Multilingual Information Access 2022
Shared Tas
An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering
To produce a domain-agnostic question answering model for the Machine Reading
Question Answering (MRQA) 2019 Shared Task, we investigate the relative
benefits of large pre-trained language models, various data sampling
strategies, as well as query and context paraphrases generated by
back-translation. We find a simple negative sampling technique to be
particularly effective, even though it is typically used for datasets that
include unanswerable questions, such as SQuAD 2.0. When applied in conjunction
with per-domain sampling, our XLNet (Yang et al., 2019)-based submission
achieved the second best Exact Match and F1 in the MRQA leaderboard
competition.Comment: Accepted at the 2nd Workshop on Machine Reading for Question
Answerin