374 research outputs found

    Intent-aware search result diversification

    Full text link
    Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches

    Sparse spatial selection for novelty-based search result diversification

    Get PDF
    Abstract. Novelty-based diversification approaches aim to produce a diverse ranking by directly comparing the retrieved documents. However, since such approaches are typically greedy, they require O(n 2) documentdocument comparisons in order to diversify a ranking of n documents. In this work, we propose to model novelty-based diversification as a similarity search in a sparse metric space. In particular, we exploit the triangle inequality property of metric spaces in order to drastically reduce the number of required document-document comparisons. Thorough experiments using three TREC test collections show that our approach is at least as effective as existing novelty-based diversification approaches, while improving their efficiency by an order of magnitude.

    Using score differences for search result diversification

    Get PDF
    We investigate the application of a light-weight approach to result list clustering for the purposes of diversifying search results. We introduce a novel post-retrieval approach, which is independent of external information or even the full-text content of retrieved documents; only the retrieval score of a document is used. Our experiments show that this novel approach is bene cial to e ectiveness, albeit only on certain baseline systems. The fact that the method works indicates that the retrieval score is potentially exploitable in diversity

    On the Additivity and Weak Baselines for Search Result Diversification Research

    Get PDF
    A recent study on the topic of additivity addresses the task of search result diversification and concludes that while weaker baselines are almost always significantly improved by the evaluated diversification methods, for stronger baselines, just the opposite happens, i.e., no significant improvement can be observed. Due to the importance of the issue in shaping future research directions and evaluation strategies in search results diversification, in this work, we first aim to reproduce the findings reported in the previous study, and then investigate its possible limitations. Our extensive experiments first reveal that under the same experimental setting with that previous study, we can reach similar results. Next, we hypothesize that for stronger baselines, tuning the parameters of some methods (i.e., the trade-off parameter between the relevance and diversity of the results in this particular scenario) should be done in a more fine-grained manner. With trade-off parameters that are specifically determined for each baseline run, we show that the percentage of significant improvements even over the strong baselines can be doubled. As a further issue, we discuss the possible impact of using the same strong baseline retrieval function for the diversity computations of the methods. Our takeaway message is that in the case of a strong baseline, it is more crucial to tune the parameters of the diversification methods to be evaluated; but once this is done, additivity is achievable

    Search Result Diversification in Short Text Streams

    Get PDF
    We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Method -- second version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models

    Explicit web search result diversification

    Get PDF
    Queries submitted to a web search engine are typically short and often ambiguous. With the enormous size of the Web, a misunderstanding of the information need underlying an ambiguous query can misguide the search engine, ultimately leading the user to abandon the originally submitted query. In order to overcome this problem, a sensible approach is to diversify the documents retrieved for the user's query. As a result, the likelihood that at least one of these documents will satisfy the user's actual information need is increased. In this thesis, we argue that an ambiguous query should be seen as representing not one, but multiple information needs. Based upon this premise, we propose xQuAD---Explicit Query Aspect Diversification, a novel probabilistic framework for search result diversification. In particular, the xQuAD framework naturally models several dimensions of the search result diversification problem in a principled yet practical manner. To this end, the framework represents the possible information needs underlying a query as a set of keyword-based sub-queries. Moreover, xQuAD accounts for the overall coverage of each retrieved document with respect to the identified sub-queries, so as to rank highly diverse documents first. In addition, it accounts for how well each sub-query is covered by the other retrieved documents, so as to promote novelty---and hence penalise redundancy---in the ranking. The framework also models the importance of each of the identified sub-queries, so as to appropriately cater for the interests of the user population when diversifying the retrieved documents. Finally, since not all queries are equally ambiguous, the xQuAD framework caters for the ambiguity level of different queries, so as to appropriately trade-off relevance for diversity on a per-query basis. The xQuAD framework is general and can be used to instantiate several diversification models, including the most prominent models described in the literature. In particular, within xQuAD, each of the aforementioned dimensions of the search result diversification problem can be tackled in a variety of ways. In this thesis, as additional contributions besides the xQuAD framework, we introduce novel machine learning approaches for addressing each of these dimensions. These include a learning to rank approach for identifying effective sub-queries as query suggestions mined from a query log, an intent-aware approach for choosing the ranking models most likely to be effective for estimating the coverage and novelty of multiple documents with respect to a sub-query, and a selective approach for automatically predicting how much to diversify the documents retrieved for each individual query. In addition, we perform the first empirical analysis of the role of novelty as a diversification strategy for web search. As demonstrated throughout this thesis, the principles underlying the xQuAD framework are general, sound, and effective. In particular, to validate the contributions of this thesis, we thoroughly assess the effectiveness of xQuAD under the standard experimentation paradigm provided by the diversity task of the TREC 2009, 2010, and 2011 Web tracks. The results of this investigation demonstrate the effectiveness of our proposed framework. Indeed, xQuAD attains consistent and significant improvements in comparison to the most effective diversification approaches in the literature, and across a range of experimental conditions, comprising multiple input rankings, multiple sub-query generation and coverage estimation mechanisms, as well as queries with multiple levels of ambiguity. Altogether, these results corroborate the state-of-the-art diversification performance of xQuAD

    Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces

    Get PDF
    Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; ArgentinaFil: Santos, Rodrygo L. T.. University Of Glasgow; Reino UnidoFil: Macdonald, Craig. University Of Glasgow; Reino UnidoFil: Ounis, Iadh. University Of Glasgow; Reino Unid

    An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

    Full text link
    Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.Comment: Original version: 10 pages. Preprint of full paper to appear at SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, US

    Experiments in Diversifying Flickr Result Sets

    No full text
    The 2013 MediaEval Retrieving Diverse Social Images Task looked to tackling the problem of search result diversification of Flickr results sets formed from queries about geographic places and landmarks. In this paper we describe our approach of using a min-max similarity diversifier coupled with pre-filters and a reranker. We also demonstrate a number of novel features for measuring similarity to use in the diversification step
    corecore