2,406 research outputs found
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
Fairness of Exposure in Rankings
Rankings are ubiquitous in the online world today. As we have transitioned
from finding books in libraries to ranking products, jobs, job applicants,
opinions and potential romantic partners, there is a substantial precedent that
ranking systems have a responsibility not only to their users but also to the
items being ranked. To address these often conflicting responsibilities, we
propose a conceptual and computational framework that allows the formulation of
fairness constraints on rankings in terms of exposure allocation. As part of
this framework, we develop efficient algorithms for finding rankings that
maximize the utility for the user while provably satisfying a specifiable
notion of fairness. Since fairness goals can be application specific, we show
how a broad range of fairness constraints can be implemented using our
framework, including forms of demographic parity, disparate treatment, and
disparate impact constraints. We illustrate the effect of these constraints by
providing empirical results on two ranking problems.Comment: In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, London, UK, 201
A Theoretical Analysis of Two-Stage Recommendation for Cold-Start Collaborative Filtering
In this paper, we present a theoretical framework for tackling the cold-start
collaborative filtering problem, where unknown targets (items or users) keep
coming to the system, and there is a limited number of resources (users or
items) that can be allocated and related to them. The solution requires a
trade-off between exploitation and exploration as with the limited
recommendation opportunities, we need to, on one hand, allocate the most
relevant resources right away, but, on the other hand, it is also necessary to
allocate resources that are useful for learning the target's properties in
order to recommend more relevant ones in the future. In this paper, we study a
simple two-stage recommendation combining a sequential and a batch solution
together. We first model the problem with the partially observable Markov
decision process (POMDP) and provide an exact solution. Then, through an
in-depth analysis over the POMDP value iteration solution, we identify that an
exact solution can be abstracted as selecting resources that are not only
highly relevant to the target according to the initial-stage information, but
also highly correlated, either positively or negatively, with other potential
resources for the next stage. With this finding, we propose an approximate
solution to ease the intractability of the exact solution. Our initial results
on synthetic data and the Movie Lens 100K dataset confirm the performance gains
of our theoretical development and analysis
Intent Models for Contextualising and Diversifying Query Suggestions
The query suggestion or auto-completion mechanisms help users to type less
while interacting with a search engine. A basic approach that ranks suggestions
according to their frequency in the query logs is suboptimal. Firstly, many
candidate queries with the same prefix can be removed as redundant. Secondly,
the suggestions can also be personalised based on the user's context. These two
directions to improve the aforementioned mechanisms' quality can be in
opposition: while the latter aims to promote suggestions that address search
intents that a user is likely to have, the former aims to diversify the
suggestions to cover as many intents as possible. We introduce a
contextualisation framework that utilises a short-term context using the user's
behaviour within the current search session, such as the previous query, the
documents examined, and the candidate query suggestions that the user has
discarded. This short-term context is used to contextualise and diversify the
ranking of query suggestions, by modelling the user's information need as a
mixture of intent-specific user models. The evaluation is performed offline on
a set of approximately 1.0M test user sessions. Our results suggest that the
proposed approach significantly improves query suggestions compared to the
baseline approach.Comment: A short version of this paper was presented at CIKM 201
Explicit relevance models in intent-oriented information retrieval diversification
This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, http://dx.doi.org/10.1145/2348283.2348297.The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components in particular redundancy assessment are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.This work was supported by the national Spanish projects
TIN2011-28538-C02-01 and S2009TIC-1542
Profile Diversity for Phenotyping Data Search and Recommendation
Session: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score
Profile Diversity for Phenotyping Data Search and Recommendation
Session: Applications innovantesSession: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score
Application and evaluation of multi-dimensional diversity
Traditional information retrieval (IR) systems mostly focus on finding documents relevant to queries without considering other documents in the search results. This approach works quite well in general cases; however, this also means that the set of returned documents in a result list can be very similar to each other. This can be an undesired system property from a user's perspective. The creation of IR systems that support the search result diversification present many challenges, indeed current evaluation measures and methodologies are still unclear with regards to specific search domains and dimensions of diversity. In this paper, we highlight various issues in relation to image search diversification for the ImageClef 2009 collection and tasks. Furthermore, we discuss the problem of defining clusters/subtopics by mixing diversity dimensions regardless of which dimension is important in relation to information need or circumstances. We also introduce possible applications and evaluation metrics for diversity based retrieval
- …