4,055 research outputs found

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks

    Full text link
    People often use a web search engine to find information about events of interest, for example, sport competitions, political elections, festivals and entertainment news. In this paper, we study a problem of detecting event-related queries, which is the first step before selecting a suitable time-aware retrieval model. In general, event-related information needs can be observed in query streams through various temporal patterns of user search behavior, e.g., spiky peaks for popular events, and periodicities for repetitive events. However, it is also common that users search for non-popular events, which may not exhibit temporal variations in query streams, e.g., past events recently occurred, historical events triggered by anniversaries or similar events, and future events anticipated to happen. To address the challenge of detecting dynamic classes of events, we propose a novel deep learning model to classify a given query into a predetermined set of multiple event types. Our proposed model, a Stacked Multilayer Perceptron (S-MLP) network, consists of multilayer perceptron used as a basic learning unit. We assemble stacked units to further learn complex relationships between neutrons in successive layers. To evaluate our proposed model, we conduct experiments using real-world queries and a set of manually created ground truth. Preliminary results have shown that our proposed deep learning model outperforms the state-of-the-art classification models significantly.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, 6 pages, 4 figure

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
    corecore