The recent explosion of question answering (QA) datasets and models has
increased the interest in the generalization of models across multiple domains
and formats by either training on multiple datasets or by combining multiple
models. Despite the promising results of multi-dataset models, some domains or
QA formats may require specific architectures, and thus the adaptability of
these models might be limited. In addition, current approaches for combining
models disregard cues such as question-answer compatibility. In this work, we
propose to combine expert agents with a novel, flexible, and training-efficient
architecture that considers questions, answer predictions, and
answer-prediction confidence scores to select the best answer among a list of
answer candidates. Through quantitative and qualitative experiments we show
that our model i) creates a collaboration between agents that outperforms
previous multi-agent and multi-dataset approaches in both in-domain and
out-of-domain scenarios, ii) is highly data-efficient to train, and iii) can be
adapted to any QA format. We release our code and a dataset of answer
predictions from expert agents for 16 QA datasets to foster future developments
of multi-agent systems on https://github.com/UKPLab/MetaQA.Comment: Accepted at EACL 202