Query expansion is a commonly-used technique in many search systems to better
represent users' information needs with additional query terms. Existing
studies for this task usually propose to expand a query with retrieved or
generated contextual documents. However, both types of methods have clear
limitations. For retrieval-based methods, the documents retrieved with the
original query might not be accurate enough to reveal the search intent,
especially when the query is brief or ambiguous. For generation-based methods,
existing models can hardly be trained or aligned on a particular corpus, due to
the lack of corpus-specific labeled data. In this paper, we propose a novel
Large Language Model (LLM) based mutual verification framework for query
expansion, which alleviates the aforementioned limitations. Specifically, we
first design a query-query-document generation pipeline, which can effectively
leverage the contextual knowledge encoded in LLMs to generate sub-queries and
corresponding documents from multiple perspectives. Next, we employ a mutual
verification method for both generated and retrieved contextual documents,
where 1) retrieved documents are filtered with the external contextual
knowledge in generated documents, and 2) generated documents are filtered with
the corpus-specific knowledge in retrieved documents. Overall, the proposed
method allows retrieved and generated documents to complement each other to
finalize a better query expansion. We conduct extensive experiments on three
information retrieval datasets, i.e., TREC-DL-2020, TREC-COVID, and MSMARCO.
The results demonstrate that our method outperforms other baselines
significantly