536 research outputs found

    Explicit relevance models in intent-oriented information retrieval diversification

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, http://dx.doi.org/10.1145/2348283.2348297.The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components in particular redundancy assessment are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.This work was supported by the national Spanish projects TIN2011-28538-C02-01 and S2009TIC-1542

    Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities

    Get PDF
    Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co‐embedded space that preserves higher‐order, neighbor‐based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co‐embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields

    Web Page Classification and Hierarchy Adaptation

    Get PDF

    DIR 2011: Dutch_Belgian Information Retrieval Workshop Amsterdam

    Get PDF
    • 

    corecore