46,915 research outputs found
Generative Relevance Feedback with Large Language Models
Current query expansion models use pseudo-relevance feedback to improve
first-pass retrieval effectiveness; however, this fails when the initial
results are not relevant. Instead of building a language model from retrieved
results, we propose Generative Relevance Feedback (GRF) that builds
probabilistic feedback models from long-form text generated from Large Language
Models. We study the effective methods for generating text by varying the
zero-shot generation subtasks: queries, entities, facts, news articles,
documents, and essays. We evaluate GRF on document retrieval benchmarks
covering a diverse set of queries and document collections, and the results
show that GRF methods significantly outperform previous PRF methods.
Specifically, we improve MAP between 5-19% and NDCG@10 17-24% compared to RM3
expansion, and achieve the best R@1k effectiveness on all datasets compared to
state-of-the-art sparse, dense, and expansion models.Comment: SIGIR 2023 Preprint, 6 page
Toward Word Embedding for Personalized Information Retrieval
This paper presents preliminary works on using Word Embedding (word2vec) for
query expansion in the context of Personalized Information Retrieval.
Traditionally, word embeddings are learned on a general corpus, like Wikipedia.
In this work we try to personalize the word embeddings learning, by achieving
the learning on the user's profile. The word embeddings are then in the same
context than the user interests. Our proposal is evaluated on the CLEF Social
Book Search 2016 collection. The results obtained show that some efforts should
be made in the way to apply Word Embedding in the context of Personalized
Information Retrieval
Queensland University of Technology at TREC 2005
The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettair’s performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettair’s performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance
Sequence to Sequence Learning for Query Expansion
Using sequence to sequence algorithms for query expansion has not been
explored yet in Information Retrieval literature nor in Question-Answering's.
We tried to fill this gap in the literature with a custom Query Expansion
engine trained and tested on open datasets. Starting from open datasets, we
built a Query Expansion training set using sentence-embeddings-based Keyword
Extraction. We therefore assessed the ability of the Sequence to Sequence
neural networks to capture expanding relations in the words embeddings' space.Comment: 8 pages, 2 figures, AAAI-19 Student Abstract and Poster Progra
Query Expansion with Locally-Trained Word Embeddings
Continuous space word embeddings have received a great deal of attention in
the natural language processing and machine learning communities for their
ability to model term similarity and other relationships. We study the use of
term relatedness in the context of query expansion for ad hoc information
retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when
trained globally, underperform corpus and query specific embeddings for
retrieval tasks. These results suggest that other tasks benefiting from global
embeddings may also benefit from local embeddings
- …