5 research outputs found
A Vertical PRF Architecture for Microblog Search
In microblog retrieval, query expansion can be essential to obtain good
search results due to the short size of queries and posts. Since information in
microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance
feedback (PRF) with an external corpus has a higher chance of retrieving more
relevant documents and improving ranking. In this paper, we focus on the
research question:how can we reduce the query expansion computational cost
while maintaining the same retrieval precision as standard PRF? Therefore, we
propose to accelerate the query expansion step of pseudo-relevance feedback.
The hypothesis is that using an expansion corpus organized into verticals for
expanding the query, will lead to a more efficient query expansion process and
improved retrieval effectiveness. Thus, the proposed query expansion method
uses a distributed search architecture and resource selection algorithms to
provide an efficient query expansion process. Experiments on the TREC Microblog
datasets show that the proposed approach can match or outperform standard PRF
in MAP and NDCG@30, with a computational cost that is three orders of magnitude
lower.Comment: To appear in ICTIR 201
Pruning Statico di Posting a Basso Impatto su Indici per Sistemi di Information Retrieval
Il web mette a disposizione una vastissima collezione di documenti che i motori di ricerca devono accedere per fornire risultati alle richieste degli utenti. Al fine di rispondere in modo efficiente ad ogni richiesta, i motori di ricerca usano strutture dati chiamate indici, che sintetizzano l'informazione contenuta all'interno della collezione di documenti. La dimensione degli indici cresce al crescere del quantitativo di informazione da memorizzare e, con questa, cresce anche il tempo necessario per il recupero dell'informazione. L'obiettivo di questa
tesi è quindi quello di proporre una strategia per l'eliminazione di informazione ritenuta non utile dall'indice, riducendone notevolmente la dimensione e quindi il tempo necessario per accedervi, preservando la qualità dei risultati dell'indice originale
Efficiency trade-offs in two-tier web search systems
Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost