Multi-hop QA involves finding multiple relevant passages and step-by-step
reasoning to answer complex questions. While previous approaches have developed
retrieval modules for selecting relevant passages, they face challenges in
scenarios beyond two hops, owing to the limited performance of one-step methods
and the failure of two-step methods when selecting irrelevant passages in
earlier stages. In this work, we introduce Beam Retrieval, a general end-to-end
retrieval framework for multi-hop QA. This approach maintains multiple partial
hypotheses of relevant passages at each step, expanding the search space and
reducing the risk of missing relevant passages. Moreover, Beam Retrieval
jointly optimizes an encoder and two classification heads by minimizing the
combined loss across all hops. To establish a complete QA system, we
incorporate a supervised reader or a zero-shot GPT-3.5. Experimental results
demonstrate that Beam Retrieval achieves a nearly 50% improvement compared with
baselines on challenging MuSiQue-Ans, and it also surpasses all previous
retrievers on HotpotQA and 2WikiMultiHopQA. Providing high-quality context,
Beam Retrieval helps our supervised reader achieve new state-of-the-art
performance and substantially improves (up to 28.8 points) the QA performance
of zero-shot GPT-3.5.Comment: Code is available at https://github.com/canghongjian/beam_retrieve