379,896 research outputs found

    A heuristic information retrieval study : an investigation of methods for enhanced searching of distributed data objects exploiting bidirectional relevance feedback

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy of the University of LutonThe primary aim of this research is to investigate methods of improving the effectiveness of current information retrieval systems. This aim can be achieved by accomplishing numerous supporting objectives. A foundational objective is to introduce a novel bidirectional, symmetrical fuzzy logic theory which may prove valuable to information retrieval, including internet searches of distributed data objects. A further objective is to design, implement and apply the novel theory to an experimental information retrieval system called ANACALYPSE, which automatically computes the relevance of a large number of unseen documents from expert relevance feedback on a small number of documents read. A further objective is to define a methodology used in this work as an experimental information retrieval framework consisting of multiple tables including various formulae which anow a plethora of syntheses of similarity functions, ternl weights, relative term frequencies, document weights, bidirectional relevance feedback and history adjusted term weights. The evaluation of bidirectional relevance feedback reveals a better correspondence between system ranking of documents and users' preferences than feedback free system ranking. The assessment of similarity functions reveals that the Cosine and Jaccard functions perform significantly better than the DotProduct and Overlap functions. The evaluation of history tracking of the documents visited from a root page reveals better system ranking of documents than tracking free information retrieval. The assessment of stemming reveals that system information retrieval performance remains unaffected, while stop word removal does not appear to be beneficial and can sometimes be harmful. The overall evaluation of the experimental information retrieval system in comparison to a leading edge commercial information retrieval system and also in comparison to the expert's golden standard of judged relevance according to established statistical correlation methods reveal enhanced system information retrieval effectiveness

    A Distribution Separation Method Using Irrelevance Feedback Data for Information Retrieval

    Get PDF
    In many research and application areas, such as information retrieval and machine learning, we often encounter dealing with a probability distribution which is mixed by one distribution that is relevant to our task in hand and the other that is irrelevant and we want to get rid of. Thus, it is an essential problem to separate the irrelevant distribution from the mixture distribution. This paper is focused on the application in Information Retrieval, where relevance feedback is a widely used technique to build a refined query model based on a set of feedback documents. However, in practice, the relevance feedback set, even provided by users explicitly or implicitly, is often a mixture of relevant and irrelevant documents. Consequently, the resultant query model (typically a term distribution) is often a mixture rather than a true relevance term distribution, leading to a negative impact on the retrieval performance. To tackle this problem, we recently proposed a Distribution Separation Method (DSM), which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While it achieved a promising performance in an empirical evaluation with simulated explicit irrelevance feedback data, it has not been deployed in the scenario where one should automatically obtain the irrelevance feedback data. In this article, we propose a substantial extension of the basic DSM from two perspectives: developing a further regularization framework and deploying DSM in the automatic irrelevance feedback scenario. Specifically, in order to avoid the output distribution of DSM drifting away from the true relevance distribution when the quality of seed irrelevant distribution (as the input to DSM) is not guaranteed, we propose a DSM regularization framework to constrain the estimation for the relevance distribution. This regularization framework includes three algorithms, each corresponding to a regularization strategy incorporated in the objective function of DSM. In addition, we exploit DSM in automatic (i.e., pseudo) irrelevance feedback, by automatically detecting the seed irrelevant documents via three different document re-ranking methods. We have carried out extensive experiments based on various TREC data sets, in order to systematically evaluate the proposed methods. The experimental results demonstrate the effectiveness of our proposed approaches in comparison with various strong baselines

    On-line Metasearch, Pooling, and System Evaluation

    Get PDF
    This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements. The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling is presented with possible application to a broad range of practical and theoretical contexts

    Learning on relevance feedback in content-based image retrieval.

    Get PDF
    Hoi, Chu-Hong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 89-103).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Content-based Image Retrieval --- p.1Chapter 1.2 --- Relevance Feedback --- p.3Chapter 1.3 --- Contributions --- p.4Chapter 1.4 --- Organization of This Work --- p.6Chapter 2 --- Background --- p.8Chapter 2.1 --- Relevance Feedback --- p.8Chapter 2.1.1 --- Heuristic Weighting Methods --- p.9Chapter 2.1.2 --- Optimization Formulations --- p.10Chapter 2.1.3 --- Various Machine Learning Techniques --- p.11Chapter 2.2 --- Support Vector Machines --- p.12Chapter 2.2.1 --- Setting of the Learning Problem --- p.12Chapter 2.2.2 --- Optimal Separating Hyperplane --- p.13Chapter 2.2.3 --- Soft-Margin Support Vector Machine --- p.15Chapter 2.2.4 --- One-Class Support Vector Machine --- p.16Chapter 3 --- Relevance Feedback with Biased SVM --- p.18Chapter 3.1 --- Introduction --- p.18Chapter 3.2 --- Biased Support Vector Machine --- p.19Chapter 3.3 --- Relevance Feedback Using Biased SVM --- p.22Chapter 3.3.1 --- Advantages of BSVM in Relevance Feedback --- p.22Chapter 3.3.2 --- Relevance Feedback Algorithm by BSVM --- p.23Chapter 3.4 --- Experiments --- p.24Chapter 3.4.1 --- Datasets --- p.24Chapter 3.4.2 --- Image Representation --- p.25Chapter 3.4.3 --- Experimental Results --- p.26Chapter 3.5 --- Discussions --- p.29Chapter 3.6 --- Summary --- p.30Chapter 4 --- Optimizing Learning with SVM Constraint --- p.31Chapter 4.1 --- Introduction --- p.31Chapter 4.2 --- Related Work and Motivation --- p.33Chapter 4.3 --- Optimizing Learning with SVM Constraint --- p.35Chapter 4.3.1 --- Problem Formulation and Notations --- p.35Chapter 4.3.2 --- Learning boundaries with SVM --- p.35Chapter 4.3.3 --- OPL for the Optimal Distance Function --- p.38Chapter 4.3.4 --- Overall Similarity Measure with OPL and SVM --- p.40Chapter 4.4 --- Experiments --- p.41Chapter 4.4.1 --- Datasets --- p.41Chapter 4.4.2 --- Image Representation --- p.42Chapter 4.4.3 --- Performance Evaluation --- p.43Chapter 4.4.4 --- Complexity and Time Cost Evaluation --- p.45Chapter 4.5 --- Discussions --- p.47Chapter 4.6 --- Summary --- p.48Chapter 5 --- Group-based Relevance Feedback --- p.49Chapter 5.1 --- Introduction --- p.49Chapter 5.2 --- SVM Ensembles --- p.50Chapter 5.3 --- Group-based Relevance Feedback Using SVM Ensembles --- p.51Chapter 5.3.1 --- (x+l)-class Assumption --- p.51Chapter 5.3.2 --- Proposed Architecture --- p.52Chapter 5.3.3 --- Strategy for SVM Combination and Group Ag- gregation --- p.52Chapter 5.4 --- Experiments --- p.54Chapter 5.4.1 --- Experimental Implementation --- p.54Chapter 5.4.2 --- Performance Evaluation --- p.55Chapter 5.5 --- Discussions --- p.56Chapter 5.6 --- Summary --- p.57Chapter 6 --- Log-based Relevance Feedback --- p.58Chapter 6.1 --- Introduction --- p.58Chapter 6.2 --- Related Work and Motivation --- p.60Chapter 6.3 --- Log-based Relevance Feedback Using SLSVM --- p.61Chapter 6.3.1 --- Problem Statement --- p.61Chapter 6.3.2 --- Soft Label Support Vector Machine --- p.62Chapter 6.3.3 --- LRF Algorithm by SLSVM --- p.64Chapter 6.4 --- Experimental Results --- p.66Chapter 6.4.1 --- Datasets --- p.66Chapter 6.4.2 --- Image Representation --- p.66Chapter 6.4.3 --- Experimental Setup --- p.67Chapter 6.4.4 --- Performance Comparison --- p.68Chapter 6.5 --- Discussions --- p.73Chapter 6.6 --- Summary --- p.75Chapter 7 --- Application: Web Image Learning --- p.76Chapter 7.1 --- Introduction --- p.76Chapter 7.2 --- A Learning Scheme for Searching Semantic Concepts --- p.77Chapter 7.2.1 --- Searching and Clustering Web Images --- p.78Chapter 7.2.2 --- Learning Semantic Concepts with Relevance Feed- back --- p.73Chapter 7.3 --- Experimental Results --- p.79Chapter 7.3.1 --- Dataset and Features --- p.79Chapter 7.3.2 --- Performance Evaluation --- p.80Chapter 7.4 --- Discussions --- p.82Chapter 7.5 --- Summary --- p.82Chapter 8 --- Conclusions and Future Work --- p.84Chapter 8.1 --- Conclusions --- p.84Chapter 8.2 --- Future Work --- p.85Chapter A --- List of Publications --- p.87Bibliography --- p.10

    Biased classification for relevance feedback in content-based image retrieval.

    Get PDF
    Peng, Xiang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 98-115).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Statement --- p.3Chapter 1.2 --- Major Contributions --- p.6Chapter 1.3 --- Thesis Outline --- p.7Chapter 2 --- Background Study --- p.9Chapter 2.1 --- Content-based Image Retrieval --- p.9Chapter 2.1.1 --- Image Representation --- p.11Chapter 2.1.2 --- High Dimensional Indexing --- p.15Chapter 2.1.3 --- Image Retrieval Systems Design --- p.16Chapter 2.2 --- Relevance Feedback --- p.19Chapter 2.2.1 --- Self-Organizing Map in Relevance Feedback --- p.20Chapter 2.2.2 --- Decision Tree in Relevance Feedback --- p.22Chapter 2.2.3 --- Bayesian Classifier in Relevance Feedback --- p.24Chapter 2.2.4 --- Nearest Neighbor Search in Relevance Feedback --- p.25Chapter 2.2.5 --- Support Vector Machines in Relevance Feedback --- p.26Chapter 2.3 --- Imbalanced Classification --- p.29Chapter 2.4 --- Active Learning --- p.31Chapter 2.4.1 --- Uncertainly-based Sampling --- p.33Chapter 2.4.2 --- Error Reduction --- p.34Chapter 2.4.3 --- Batch Selection --- p.35Chapter 2.5 --- Convex Optimization --- p.35Chapter 2.5.1 --- Overview of Convex Optimization --- p.35Chapter 2.5.2 --- Linear Program --- p.37Chapter 2.5.3 --- Quadratic Program --- p.37Chapter 2.5.4 --- Quadratically Constrained Quadratic Program --- p.37Chapter 2.5.5 --- Cone Program --- p.38Chapter 2.5.6 --- Semi-definite Program --- p.39Chapter 3 --- Imbalanced Learning with BMPM for CBIR --- p.40Chapter 3.1 --- Research Motivation --- p.41Chapter 3.2 --- Background Review --- p.42Chapter 3.2.1 --- Relevance Feedback for CBIR --- p.42Chapter 3.2.2 --- Minimax Probability Machine --- p.42Chapter 3.2.3 --- Extensions of Minimax Probability Machine --- p.44Chapter 3.3 --- Relevance Feedback using BMPM --- p.45Chapter 3.3.1 --- Model Definition --- p.45Chapter 3.3.2 --- Advantages of BMPM in Relevance Feedback --- p.46Chapter 3.3.3 --- Relevance Feedback Framework by BMPM --- p.47Chapter 3.4 --- Experimental Results --- p.47Chapter 3.4.1 --- Experiment Datasets --- p.48Chapter 3.4.2 --- Performance Evaluation --- p.50Chapter 3.4.3 --- Discussions --- p.53Chapter 3.5 --- Summary --- p.53Chapter 4 --- BMPM Active Learning for CBIR --- p.55Chapter 4.1 --- Problem Statement and Motivation --- p.55Chapter 4.2 --- Background Review --- p.57Chapter 4.3 --- Relevance Feedback by BMPM Active Learning . --- p.58Chapter 4.3.1 --- Active Learning Concept --- p.58Chapter 4.3.2 --- General Approaches for Active Learning . --- p.59Chapter 4.3.3 --- Biased Minimax Probability Machine --- p.60Chapter 4.3.4 --- Proposed Framework --- p.61Chapter 4.4 --- Experimental Results --- p.63Chapter 4.4.1 --- Experiment Setup --- p.64Chapter 4.4.2 --- Performance Evaluation --- p.66Chapter 4.5 --- Summary --- p.68Chapter 5 --- Large Scale Learning with BMPM --- p.70Chapter 5.1 --- Introduction --- p.71Chapter 5.1.1 --- Motivation --- p.71Chapter 5.1.2 --- Contribution --- p.72Chapter 5.2 --- Background Review --- p.72Chapter 5.2.1 --- Second Order Cone Program --- p.72Chapter 5.2.2 --- General Methods for Large Scale Problems --- p.73Chapter 5.2.3 --- Biased Minimax Probability Machine --- p.75Chapter 5.3 --- Efficient BMPM Training --- p.78Chapter 5.3.1 --- Proposed Strategy --- p.78Chapter 5.3.2 --- Kernelized BMPM and Its Solution --- p.81Chapter 5.4 --- Experimental Results --- p.82Chapter 5.4.1 --- Experimental Testbeds --- p.83Chapter 5.4.2 --- Experimental Settings --- p.85Chapter 5.4.3 --- Performance Evaluation --- p.87Chapter 5.5 --- Summary --- p.92Chapter 6 --- Conclusion and Future Work --- p.93Chapter 6.1 --- Conclusion --- p.93Chapter 6.2 --- Future Work --- p.94Chapter A --- List of Symbols and Notations --- p.96Chapter B --- List of Publications --- p.98Bibliography --- p.10

    Relevance-based language modelling for recommender systems

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Information Processing and Management: an International Journal. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Information Processing and Management: an International Journal, 49, 4, (2013) DOI: 10.1016/j.ipm.2013.03.001Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of recommender systems is a fertile research area where users are provided with personalised recommendations in several applications. In this paper, we propose an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user. We also propose a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process. These techniques, although well known in the Information Retrieval field, have not been applied yet to recommender systems, and, as the empirical evaluation results show, both proposals outperform individually several baseline methods. Furthermore, by combining both approaches even larger effectiveness improvements are achieved.This work was funded by Secretaría de Estado de Investigación, Desarrollo e Innovación from the Spanish Government under Projects TIN2012-33867 and TIN2011-28538-C02

    Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

    Get PDF
    The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    Evaluating implicit feedback models using searcher simulations

    Get PDF
    In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. We introduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffrey's rule of conditioning outperformed other models under investigation
    corecore