6,376 research outputs found

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    On Horizontal and Vertical Separation in Hierarchical Text Classification

    Get PDF
    Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce a "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Hierarchical Significant Words Language Models (HSWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real-world data and demonstrate that how HSWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.Comment: Full paper (10 pages) accepted for publication in proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16

    Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View

    Full text link
    Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content based multimedia information retrieval. We focus on graph based methods which have proven to provide state-of-the-art performances. We particularly examine two of such methods : cross-media similarities and random walk based scores. From a theoretical viewpoint, we propose a unifying graph based framework which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph based technique for the combination of visual and textual information. We compare cross-media and random walk based results using three different real-world datasets. From a practical standpoint, our extended empirical analysis allow us to provide insights and guidelines about the use of graph based methods for multimodal information fusion in content based multimedia information retrieval.Comment: An extended version of the paper: Visual and Textual Information Fusion in Multimedia Retrieval using Semantic Filtering and Graph based Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM Transactions on Information System

    Using the quantum probability ranking principle to rank interdependent documents

    Get PDF
    A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements

    A quasi-current representation for information needs inspired by Two-State Vector Formalism

    Get PDF
    Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for users’ current IN in a sense that it does not take the ‘future’ information into consideration. Therefore, to seek a more proper and complete representation for users’ IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a “two-state vector” derived from the ‘future’ (the current query) and the ‘history’ (the previous query) is employed to describe users’ quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models

    A Distribution Separation Method Using Irrelevance Feedback Data for Information Retrieval

    Get PDF
    In many research and application areas, such as information retrieval and machine learning, we often encounter dealing with a probability distribution which is mixed by one distribution that is relevant to our task in hand and the other that is irrelevant and we want to get rid of. Thus, it is an essential problem to separate the irrelevant distribution from the mixture distribution. This paper is focused on the application in Information Retrieval, where relevance feedback is a widely used technique to build a refined query model based on a set of feedback documents. However, in practice, the relevance feedback set, even provided by users explicitly or implicitly, is often a mixture of relevant and irrelevant documents. Consequently, the resultant query model (typically a term distribution) is often a mixture rather than a true relevance term distribution, leading to a negative impact on the retrieval performance. To tackle this problem, we recently proposed a Distribution Separation Method (DSM), which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While it achieved a promising performance in an empirical evaluation with simulated explicit irrelevance feedback data, it has not been deployed in the scenario where one should automatically obtain the irrelevance feedback data. In this article, we propose a substantial extension of the basic DSM from two perspectives: developing a further regularization framework and deploying DSM in the automatic irrelevance feedback scenario. Specifically, in order to avoid the output distribution of DSM drifting away from the true relevance distribution when the quality of seed irrelevant distribution (as the input to DSM) is not guaranteed, we propose a DSM regularization framework to constrain the estimation for the relevance distribution. This regularization framework includes three algorithms, each corresponding to a regularization strategy incorporated in the objective function of DSM. In addition, we exploit DSM in automatic (i.e., pseudo) irrelevance feedback, by automatically detecting the seed irrelevant documents via three different document re-ranking methods. We have carried out extensive experiments based on various TREC data sets, in order to systematically evaluate the proposed methods. The experimental results demonstrate the effectiveness of our proposed approaches in comparison with various strong baselines
    • 

    corecore