Search CORE

17 research outputs found

Ensemble clustering for result diversification

Author: Hiemstra Djoerd
Nguyen Dong-Phuong
Publication venue: NIST
Publication date: 01/01/2012
Field of study

This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run

Edinburgh Research Explorer

Radboud Repository

University of Twente Research Information

Closing the loop: assisting archival appraisal and information retrieval in one sweep

Author: Kim Y.
Ross S.
Publication venue
Publication date: 01/01/2013
Field of study

In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval

Crossref

Enlighten

Combining implicit and explicit topic representations for result diversification

Author: He J. (Jiyin)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2012
Field of study

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models

CiteSeerX

CWI's Institutional Repository

Provable randomized rounding for minimum-similarity diversification

Author: Gionis Aristides
Mahadevan Ananth
Matakos Antonis
Ordozgoiti Bruno
Publication venue
Publication date: 01/01/2022
Field of study

When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.Peer reviewe

PubMed Central

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Leveraging Semantic Resources in Diversified Query Expansion

Author: Krishnan Adit
Mehta Sameep
Padmanabhan Deepak
Ranu Sayan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/06/2017
Field of study

Queen's University Belfast Research Portal

Diversity in similarity joins

Author: Carvalho Luiz Olmes
Oliveira Willian Dener de
Santos Lúcio Fernandes Dutra
Traina Junior Caetano
Traina Agma Juci Machado
Publication venue: Cham
Publication date
Field of study

With the increasing ability of current applications to produce and consume more complex data, such as images and geographic information, the similarity join has attracted considerable attention. However, this operator does not consider the relationship among the elements in the answer, generating results with many pairs similar among themselves, which does not add value to the final answer. Result diversification methods are intended to retrieve elements similar enough to satisfy the similarity conditions, but also considering the diversity among the elements in the answer, producing a more heterogeneous result with smaller cardinality, which improves the meaning of the answer. Still, diversity have been studied only when applied to unary operations. In this paper, we introduce the concept of diverse similarity joins: a similarity join operator that ensures a smaller, more diversified and useful answers. The experiments performed on real and synthetic datasets show that our proposal allows exploiting diversity in similarity joins without diminish their performance whereas providing elements that cover the same data space distribution of the non-diverse answers.FAPESPCNPQCAPESRescuer (EU Commission Grant 614154 and CNPQ/MCTI Grant 490084/2013-3)International Conference on Similarity Search and Applications - SISAP (8. 2015 Glasgow

Ranking Optimization with Constraints

Author: Fangzhao Wu
Hang Li
Jun Xu
Xin Jiang
‡
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT This paper addresses the problem of post-processing of ranking in search, referred to as post ranking. Although important, no research seems to have been conducted on the problem, particularly with a principled approach, and in practice ad-hoc ways of performing the task are being adopted. This paper formalizes the problem as constrained optimization in which the constraints represent the post-processing rules and the objective function represents the trade-off between adherence to the original ranking and satisfaction of the rules. The optimization amounts to refining the original ranking result based on the rules. We further propose a specific probabilistic implementation of the general formalization on the basis of the Bradley-Terry model, which is theoretically sound, effective, and efficient. Our experimental results, using benchmark datasets and enterprise search dataset, show that the proposed method works much better than several baseline methods of utilizing rules

CiteSeerX

Closing the loop: Assisting archival appraisal and information retrieval in one sweep

Author: Barry
Barry
Bianchini
Blei
Borlund
Burges
Cacheda
Castillo
Caverlee
Cerviño Beresi
Cook
Deerwester
Eastwood
Goldberg
Hagar
Harvey
Jenkinson
Joachims
Jones
Kim
Manning
Mathews
Oliver
Petrenz
Ponte
Savoy
Scaringella
Schellenberg
Schneider
Sebastiani
Selamat
Spärck-Jones
Teufel
Tibbo
Tzanetakis
Vapnik
Whyte
Publication venue: 'Wiley'
Publication date
Field of study

Crossref