5 research outputs found

    A Vertical PRF Architecture for Microblog Search

    Full text link
    In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking. In this paper, we focus on the research question:how can we reduce the query expansion computational cost while maintaining the same retrieval precision as standard PRF? Therefore, we propose to accelerate the query expansion step of pseudo-relevance feedback. The hypothesis is that using an expansion corpus organized into verticals for expanding the query, will lead to a more efficient query expansion process and improved retrieval effectiveness. Thus, the proposed query expansion method uses a distributed search architecture and resource selection algorithms to provide an efficient query expansion process. Experiments on the TREC Microblog datasets show that the proposed approach can match or outperform standard PRF in MAP and NDCG@30, with a computational cost that is three orders of magnitude lower.Comment: To appear in ICTIR 201

    Pruning Statico di Posting a Basso Impatto su Indici per Sistemi di Information Retrieval

    Get PDF
    Il web mette a disposizione una vastissima collezione di documenti che i motori di ricerca devono accedere per fornire risultati alle richieste degli utenti. Al fine di rispondere in modo efficiente ad ogni richiesta, i motori di ricerca usano strutture dati chiamate indici, che sintetizzano l'informazione contenuta all'interno della collezione di documenti. La dimensione degli indici cresce al crescere del quantitativo di informazione da memorizzare e, con questa, cresce anche il tempo necessario per il recupero dell'informazione. L'obiettivo di questa tesi è quindi quello di proporre una strategia per l'eliminazione di informazione ritenuta non utile dall'indice, riducendone notevolmente la dimensione e quindi il tempo necessario per accedervi, preservando la qualità dei risultati dell'indice originale

    Efficiency trade-offs in two-tier web search systems

    Get PDF
    Search engines rely on searching multiple partitioned corpora to return results to users in a reasonable amount of time. In this paper we analyze the standard two-tier architecture for Web search with the difference that the corpus to be searched for a given query is predicted in advance. We show that any predictor better than random yields time savings, but this decrease in the processing time yields an increase in the infrastructure cost. We provide an analysis and investigate this trade-off in the context of two different scenarios on real-world data. We demonstrate that in general the decrease in answer time is justified by a small increase in infrastructure cost
    corecore