4,055 research outputs found
Stochastic Query Covering for Fast Approximate Document Retrieval
We design algorithms that, given a collection of documents and a distribution over user queries, return a
small subset of the document collection in such a way that we can efficiently provide high-quality answers
to user queries using only the selected subset. This approach has applications when space is a constraint
or when the query-processing time increases significantly with the size of the collection. We study our
algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction
of the entire collection, they can provide answers to most user queries, achieving a performance close to the
optimal. To complement our theoretical findings, we experimentally show the versatility of our approach
by considering two important cases in the context of Web search. In the first case, we favor the retrieval of
documents that are relevant to the query, whereas in the second case we aim for document diversification.
Both the theoretical and the experimental analysis provide strong evidence of the potential value of query
covering in diverse application scenarios
Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks
People often use a web search engine to find information about events of
interest, for example, sport competitions, political elections, festivals and
entertainment news. In this paper, we study a problem of detecting
event-related queries, which is the first step before selecting a suitable
time-aware retrieval model. In general, event-related information needs can be
observed in query streams through various temporal patterns of user search
behavior, e.g., spiky peaks for popular events, and periodicities for
repetitive events. However, it is also common that users search for non-popular
events, which may not exhibit temporal variations in query streams, e.g., past
events recently occurred, historical events triggered by anniversaries or
similar events, and future events anticipated to happen. To address the
challenge of detecting dynamic classes of events, we propose a novel deep
learning model to classify a given query into a predetermined set of multiple
event types. Our proposed model, a Stacked Multilayer Perceptron (S-MLP)
network, consists of multilayer perceptron used as a basic learning unit. We
assemble stacked units to further learn complex relationships between neutrons
in successive layers. To evaluate our proposed model, we conduct experiments
using real-world queries and a set of manually created ground truth.
Preliminary results have shown that our proposed deep learning model
outperforms the state-of-the-art classification models significantly.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, 6 pages, 4
figure
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
- …