850 research outputs found

    Result Diversification in Search and Recommendation: A Survey

    Full text link
    Diversifying return results is an important research topic in retrieval systems in order to satisfy both the various interests of customers and the equal market exposure of providers. There has been growing attention on diversity-aware research during recent years, accompanied by a proliferation of literature on methods to promote diversity in search and recommendation. However, diversity-aware studies in retrieval systems lack a systematic organization and are rather fragmented. In this survey, we are the first to propose a unified taxonomy for classifying the metrics and approaches of diversification in both search and recommendation, which are two of the most extensively researched fields of retrieval systems. We begin the survey with a brief discussion of why diversity is important in retrieval systems, followed by a summary of the various diversity concerns in search and recommendation, highlighting their relationship and differences. For the survey's main body, we present a unified taxonomy of diversification metrics and approaches in retrieval systems, from both the search and recommendation perspectives. In the later part of the survey, we discuss the open research questions of diversity-aware research in search and recommendation in an effort to inspire future innovations and encourage the implementation of diversity in real-world systems.Comment: 20 page

    Intent-aware search result diversification

    Full text link
    Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches

    An in-depth study on diversity evaluation : The importance of intrinsic diversity

    Get PDF
    Diversified document ranking has been recognized as an effective strategy to tackle ambiguous and/or underspecified queries. In this paper, we conduct an in-depth study on diversity evaluation that provides insights for assessing the performance of a diversified retrieval system. By casting the widely used diversity metrics (e.g., ERR-IA, α-nDCG and D#-nDCG) into a unified framework based on marginal utility, we analyze how these metrics capture extrinsic diversity and intrinsic diversity. Our analyses show that the prior metrics (ERR-IA, α-nDCG and D#-nDCG) are not able to precisely measure intrinsic diversity if we merely feed a set of subtopics into them in a traditional manner (i.e., without fine-grained relevance knowledge per subtopic). As the redundancy of relevant documents with respect to each specific information need (i.e., subtopic) can not be then detected and solved, the overall diversity evaluation may not be reliable. Furthermore, a series of experiments are conducted on a gold standard collection (English and Chinese) and a set of submitted runs, where the intent-square metrics that extend the diversity metrics through incorporating hierarchical subtopics are used as references. The experimental results show that the intent-square metrics disagree with the diversity metrics (ERR-IA and α-nDCG) being used in a traditional way on top-ranked runs, and that the average precision correlation scores between intent-square metrics and the prior diversity metrics (ERR-IA and α-nDCG) are fairly low. These results justify our analyses, and uncover the previously-unknown importance of intrinsic diversity to the overall diversity evaluation

    Efficient Diversification of Web Search Results

    Full text link
    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

    A Survey on Automatically Mining Facets for Web Queries

    Get PDF
    In this paper, a detailed survey on different facet mining techniques, their advantages and disadvantages is carried out. Facets are any word or phrase which summarize an important aspect about the web query. Researchers proposed different efficient techniques which improves the user’s web query search experiences magnificently. Users are happy when they find the relevant information to their query in the top results. The objectives of their research are: (1) To present automated solution to derive the query facets by analyzing the text query; (2) To create taxonomy of query refinement strategies for efficient results; and (3) To personalize search according to user interest

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios
    corecore