23,688 research outputs found
Stochastic Query Covering for Fast Approximate Document Retrieval
We design algorithms that, given a collection of documents and a distribution over user queries, return a
small subset of the document collection in such a way that we can efficiently provide high-quality answers
to user queries using only the selected subset. This approach has applications when space is a constraint
or when the query-processing time increases significantly with the size of the collection. We study our
algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction
of the entire collection, they can provide answers to most user queries, achieving a performance close to the
optimal. To complement our theoretical findings, we experimentally show the versatility of our approach
by considering two important cases in the context of Web search. In the first case, we favor the retrieval of
documents that are relevant to the query, whereas in the second case we aim for document diversification.
Both the theoretical and the experimental analysis provide strong evidence of the potential value of query
covering in diverse application scenarios
DYNIQX: A novel meta-search engine for the web
The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based search. Dyniqx integrates search results from search services of documents, images, and videos for generating a unified list of ranked search results. Dyniqx exploits the availability of metadata in search services such as PubMed, Google Scholar, Google Image Search, and Google Video Search etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are used by users to filter search results. Our preliminary user evaluation shows that Dyniqx can help users complete information search tasks more efficiently and successfully than three well known search engines respectively. We also carried out one controlled user evaluation of the integration of six document/image/video based search engines (Google Scholar, PubMed, Intute, Google Image, Yahoo Image, and Google Video) in Dyniqx. We designed a questionnaire for evaluating different aspect of Dyniqx in assisting users complete search tasks. Each user used Dyniqx to perform a number of search tasks before completing the questionnaire. Our evaluation results confirm the effectiveness of the meta-search of Dyniqx in assisting user search tasks, and provide insights into better designs of the Dyniqx' interface
Term-Specific Eigenvector-Centrality in Multi-Relation Networks
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
Porqpine: a peer-to-peer search engine
In this paper, we present a fully distributed and collaborative search
engine for web pages: Porqpine. This system uses a novel query-based model
and collaborative filtering techniques in order to obtain user-customized
results. All knowledge about users and profiles is stored in each user
node?s application. Overall the system is a multi-agent system that runs on
the computers of the user community. The nodes interact in a peer-to-peer
fashion in order to create a real distributed search engine where
information is completely distributed among all the nodes in the network.
Moreover, the system preserves the privacy of user queries and results by
maintaining the anonymity of the queries? consumers and results? producers.
The knowledge required by the system to work is implicitly caught through
the monitoring of users actions, not only within the system?s interface but
also within one of the most popular web browsers. Thus, users are not
required to explicitly feed knowledge about their interests into the system
since this process is done automatically. In this manner, users obtain the
benefits of a personalized search engine just by installing the application
on their computer. Porqpine does not intend to shun completely conventional
centralized search engines but to complement them by issuing more accurate
and personalized results.Postprint (published version
Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes
In this paper we present an evaluation of techniques that are designed to encourage web searchers to interact more with the results of a web search. Two specific techniques are examined: the presentation of sentences that highly match the searcher's query and the use of implicit evidence. Implicit evidence is evidence captured from the searcher's interaction with the retrieval results and is used to automatically update the display. Our evaluation concentrates on the effectiveness and subject perception of these techniques. The results show, with statistical significance, that the techniques are effective and efficient for information seeking
Content-Based Weak Supervision for Ad-Hoc Re-Ranking
One challenge with neural ranking is the need for a large amount of
manually-labeled relevance judgments for training. In contrast with prior work,
we examine the use of weak supervision sources for training that yield pseudo
query-document pairs that already exhibit relevance (e.g., newswire
headline-content pairs and encyclopedic heading-paragraph pairs). We also
propose filtering techniques to eliminate training samples that are too far out
of domain using two techniques: a heuristic-based approach and novel supervised
filter that re-purposes a neural ranker. Using several leading neural ranking
architectures and multiple weak supervision datasets, we show that these
sources of training pairs are effective on their own (outperforming prior weak
supervision techniques), and that filtering can further improve performance.Comment: SIGIR 2019 (short paper
Topic modeling for entity linking using keyphrase
This paper proposes an Entity Linking system that applies a topic modeling ranking. We apply a novel approach in order to provide new relevant elements to the model. These elements are keyphrases related to the queries and gathered from a huge Wikipedia-based knowledge resourcePeer ReviewedPostprint (author’s final draft
Personalized Ranking in eCommerce Search
We address the problem of personalization in the context of eCommerce search.
Specifically, we develop personalization ranking features that use in-session
context to augment a generic ranker optimized for conversion and relevance. We
use a combination of latent features learned from item co-clicks in historic
sessions and content-based features that use item title and price.
Personalization in search has been discussed extensively in the existing
literature. The novelty of our work is combining and comparing content-based
and content-agnostic features and showing that they complement each other to
result in a significant improvement of the ranker. Moreover, our technique does
not require an explicit re-ranking step, does not rely on learning user
profiles from long term search behavior, and does not involve complex modeling
of query-item-user features. Our approach captures item co-click propensity
using lightweight item embeddings. We experimentally show that our technique
significantly outperforms a generic ranker in terms of Mean Reciprocal Rank
(MRR). We also provide anecdotal evidence for the semantic similarity captured
by the item embeddings on the eBay search engine.Comment: Under Revie
- …