Search CORE

23,688 research outputs found

Stochastic Query Covering for Fast Approximate Document Retrieval

Author: Anagnostopoulos Aristidis
Becchetti Luca
Ida Mele
Ilaria Bordino
Leonardi Stefano
Piotr Sankowski
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

DYNIQX: A novel meta-search engine for the web

Author: Barladeanu Cristi
Eisenstadt Marc
Rüger Stefan
Song Dawei
Zhu Jianhan
Publication venue
Publication date: 01/01/2009
Field of study

The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based search. Dyniqx integrates search results from search services of documents, images, and videos for generating a unified list of ranked search results. Dyniqx exploits the availability of metadata in search services such as PubMed, Google Scholar, Google Image Search, and Google Video Search etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are used by users to filter search results. Our preliminary user evaluation shows that Dyniqx can help users complete information search tasks more efficiently and successfully than three well known search engines respectively. We also carried out one controlled user evaluation of the integration of six document/image/video based search engines (Google Scholar, PubMed, Intute, Google Image, Yahoo Image, and Google Video) in Dyniqx. We designed a questionnaire for evaluating different aspect of Dyniqx in assisting users complete search tasks. Each user used Dyniqx to perform a number of search tasks before completing the questionnaire. Our evaluation results confirm the effectiveness of the meta-search of Dyniqx in assisting user search tasks, and provide insights into better designs of the Dyniqx' interface

Open Access Institutional Repository at Robert Gordon University

Open Research Online (The Open University)

Term-Specific Eigenvector-Centrality in Multi-Relation Networks

Author: Bry François
Furche Tim
Kneißl Fabian
Weiand Klara
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2011
Field of study

Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

CiteSeerX

Crossref

Open Access LMU

Porqpine: a peer-to-peer search engine

Author: Bermúdez Juanjo
Pujol Josep Maria
Sangüesa i Sole Ramon
Publication venue
Publication date: 01/01/2003
Field of study

In this paper, we present a fully distributed and collaborative search engine for web pages: Porqpine. This system uses a novel query-based model and collaborative filtering techniques in order to obtain user-customized results. All knowledge about users and profiles is stored in each user node?s application. Overall the system is a multi-agent system that runs on the computers of the user community. The nodes interact in a peer-to-peer fashion in order to create a real distributed search engine where information is completely distributed among all the nodes in the network. Moreover, the system preserves the privacy of user queries and results by maintaining the anonymity of the queries? consumers and results? producers. The knowledge required by the system to work is implicitly caught through the monitoring of users actions, not only within the system?s interface but also within one of the most popular web browsers. Thus, users are not required to explicitly feed knowledge about their interests into the system since this process is done automatically. In this manner, users obtain the benefits of a personalized search engine just by installing the application on their computer. Porqpine does not intend to shun completely conventional centralized search engines but to complement them by issuing more accurate and personalized results.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes

Author: Jose J.M.
Ruthven I.G.
White R.W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2002
Field of study

In this paper we present an evaluation of techniques that are designed to encourage web searchers to interact more with the results of a web search. Two specific techniques are examined: the presentation of sentences that highly match the searcher's query and the use of implicit evidence. Implicit evidence is evidence captured from the searcher's interaction with the retrieval results and is used to automatically update the display. Our evaluation concentrates on the effectiveness and subject perception of these techniques. The results show, with statistical significance, that the techniques are effective and efficient for information seeking

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Enlighten

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

Author: Dietz Laura
Hui Kai
Li Bo
Sandhaus Evan
Strohman Trevor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.Comment: SIGIR 2019 (short paper

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Topic modeling for entity linking using keyphrase

Author: Mehdizadeh Naderi Ali
Rodríguez Hontoria Horacio
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2014
Field of study

This paper proposes an Entity Linking system that applies a topic modeling ranking. We apply a novel approach in order to provide new relevant elements to the model. These elements are keyphrases related to the queries and gathered from a huge Wikipedia-based knowledge resourcePeer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Personalized Ranking in eCommerce Search

Author: Aslanyan Grigor
Jaiswal Amit
Kannadasan Manojkumar Rangasamy
Kumar Prathyusha Senthil
Mandal Aritra
Publication venue
Publication date: 30/04/2019
Field of study

We address the problem of personalization in the context of eCommerce search. Specifically, we develop personalization ranking features that use in-session context to augment a generic ranker optimized for conversion and relevance. We use a combination of latent features learned from item co-clicks in historic sessions and content-based features that use item title and price. Personalization in search has been discussed extensively in the existing literature. The novelty of our work is combining and comparing content-based and content-agnostic features and showing that they complement each other to result in a significant improvement of the ranker. Moreover, our technique does not require an explicit re-ranking step, does not rely on learning user profiles from long term search behavior, and does not involve complex modeling of query-item-user features. Our approach captures item co-click propensity using lightweight item embeddings. We experimentally show that our technique significantly outperforms a generic ranker in terms of Mean Reciprocal Rank (MRR). We also provide anecdotal evidence for the semantic similarity captured by the item embeddings on the eBay search engine.Comment: Under Revie

arXiv.org e-Print Archive

Crossref