77,664 research outputs found

    Relevance feedback for best match term weighting algorithms in information retrieval

    Get PDF
    Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the first based on statistical language models and the second based on the binary independence probabilistic model. The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms on the TREC collection. The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback

    Using Language Models for Information Retrieval

    Get PDF
    Because of the world wide web, information retrieval systems are now used by millions of untrained users all over the world. The search engines that perform the information retrieval tasks, often retrieve thousands of potentially interesting documents to a query. The documents should be ranked in decreasing order of relevance in order to be useful to the user. This book describes a mathematical model of information retrieval based on the use of statistical language models. The approach uses simple document-based unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents. The study makes the following research contributions. * The development of a model that integrates term weighting, relevance feedback and structured queries. * The development of a model that supports multiple representations of a request or information need by integrating a statistical translation model. * The development of a model that supports multiple representations of a document, for instance by allowing proximity searches or searches for terms from a particular record field (e.g. a search for terms from the title). * A mathematical interpretation of stop word removal and stemming. * A mathematical interpretation of operators for mandatory terms, wildcards and synonyms. * A practical comparison of a language model-based retrieval system with similar systems that are based on well-established models and term weighting algorithms in a controlled experiment. * The application of the model to cross-language information retrieval and adaptive information filtering, and the evaluation of two prototype systems in a controlled experiment. Experimental results on three standard tasks show that the language model-based algorithms work as well as, or better than, today's top-performing retrieval algorithms. The standard tasks investigated are ad-hoc retrieval (when there are no previously retrieved documents to guide the search), retrospective relevance weighting (find the optimum model for a given set of relevant documents), and ad-hoc retrieval using manually formulated Boolean queries. The application to cross-language retrieval and adaptive filtering shows the practical use of respectively structured queries, and relevance feedback

    A study of relevance feedback in vector space model

    Full text link
    Information Retrieval is the science of searching for information or documents based on information need from a huge set of documents. It has been an active field of research since early 19th century and different models of retrieval came in to existence to cater the information need. This thesis starts with understanding some of the basic information retrieval models, followed by implementation of one of the most popular statistical retrieval model known as Vector Space Model. This model ranks the documents in the collection based on the similarity measure calculated between the query and the respective document. The user specifies the information need which is more commonly known as a query using the visual interface provided. The given query is then processed and the results are displayed to the user in a ranked order. We then focus on the Relevance feedback, a technique that modifies the user query based on the characteristics of the document collection to improve the results. In this thesis, we explore different types and models of relevance feedback that can be applied to Vector Space model and how they affect the performance of the model

    Probabilistic collaborative filtering with negative cross entropy

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in RecSys '13 Proceedings of the 7th ACM conference on Recommender systems, http://dx.doi.org/10.1145/2507157.2507191.Relevance-Based Language Models are an effective IR approach which explicitly introduces the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models have shown to achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. In this paper we propose a novel adaptation of this language modeling approach to rating-based Collaborative Filtering. In a memory-based approach, we apply the model to the formation of user neighbourhoods, and the generation of recommendations based on such neighbourhoods. We report experimental results where our method outperforms other standard memory-based algorithms in terms of ranking precision.This work was funded by Secretaría de Estado de Investigación, Desarrollo e Innovación from the Spanish Government under projects TIN2012-33867 and TIN2011-28538-C02

    Approximating true relevance model in relevance feedback.

    Get PDF
    Relevance is an essential concept in information retrieval (IR) and relevance estimation is a fundamental IR task. It involves not only document relevance estimation, but also estimation of user's information need. Relevance-based language model aims to estimate a relevance model (i.e., a relevant query term distribution) from relevance feedback documents. The true relevance model should be generated from truly relevant documents. The ideal estimation of the true relevance model is expected to be not only effective in terms of mean retrieval performance (e.g., Mean Average Precision) over all the queries, but also stable in the sense that the performance is stable across different individual queries. In practice, however, in approximating/estimating the true relevance model, the improvement of retrieval effectiveness often sacrifices the retrieval stability, and vice versa. In this thesis, we propose to explore and analyze such effectiveness-stability tradeoff from a new perspective, i.e., the bias-variance tradeoff that is a fundamental theory in statistical estimation. We first formulate the bias, variance and the trade-off between them for retrieval performance as well as for query model estimation. We then analytically and empirically study a number of factors (e.g., query model complexity, query model combination, document weight smoothness and irrelevant documents removal) that can affect the bias and variance. Our study shows that the proposed bias-variance trade-off analysis can serve as an analytical framework for query model estimation. We then investigate in depth on two particular key factors: document weight smoothness and removal of irrelevant documents, in query model estimation, by proposing novel methods for document weight smoothing and irrelevance distribution separation, respectively. Systematic experimental evaluation on TREC collections shows that the proposed methods can improve both retrieval effectiveness and retrieval stability of query model estimation. In addition to the above main contributions, we also carry out initial exploration on two further directions: the formulation of bias-variance in personalization and looking at the query model estimation via a novel theoretical angle (i.e., Quantum theory) that has partially inspired our research

    Implementation of an Information Retrieval System (ANIRS) with Ranking and Browsing Capabilities

    Get PDF
    This report describes an implementation of a cluster based information retrieval system with statistical ranking facilities, ANIRS. ANIRS uses the vector space model to represent the document database. In this model, the database is defined by a document by term, D, matrix. In this matrix, each row represents the terms in a single document and each column represents the documents that contain a single term. In ANIRS, two matching methodologies are allowed: a full database search and a cluster based search. The system uses a natural language query interface. It incorporates suffix stripping for term conglomeration. Two methods of query refinement are used: relevance feedback and document seed searching. Cluster browsing, the ability to look at all the documents in a single cluster, is also implemented

    Content-based image retrieval: reading one's mind and helping people share.

    Get PDF
    Sia Ka Cheung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Statement --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Thesis Organization --- p.4Chapter 2 --- Background --- p.5Chapter 2.1 --- Content-Based Image Retrieval --- p.5Chapter 2.1.1 --- Feature Extraction --- p.6Chapter 2.1.2 --- Indexing and Retrieval --- p.7Chapter 2.2 --- Relevance Feedback --- p.7Chapter 2.2.1 --- Weight Updating --- p.9Chapter 2.2.2 --- Bayesian Formulation --- p.11Chapter 2.2.3 --- Statistical Approaches --- p.12Chapter 2.2.4 --- Inter-query Feedback --- p.12Chapter 2.3 --- Peer-to-Peer Information Retrieval --- p.14Chapter 2.3.1 --- Distributed Hash Table Techniques --- p.16Chapter 2.3.2 --- Routing Indices and Shortcuts --- p.17Chapter 2.3.3 --- Content-Based Retrieval in P2P Systems --- p.18Chapter 3 --- Parameter Estimation-Based Relevance Feedback --- p.21Chapter 3.1 --- Parameter Estimation of Target Distribution --- p.21Chapter 3.1.1 --- Motivation --- p.21Chapter 3.1.2 --- Model --- p.23Chapter 3.1.3 --- Relevance Feedback --- p.24Chapter 3.1.4 --- Maximum Entropy Display --- p.26Chapter 3.2 --- Self-Organizing Map Based Inter-Query Feedback --- p.27Chapter 3.2.1 --- Motivation --- p.27Chapter 3.2.2 --- Initialization and Replication of SOM --- p.29Chapter 3.2.3 --- SOM Training for Inter-query Feedback --- p.31Chapter 3.2.4 --- Target Estimation and Display Set Selection for Intra- query Feedback --- p.33Chapter 3.3 --- Experiment --- p.35Chapter 3.3.1 --- Study of Parameter Estimation Method Using Synthetic Data --- p.35Chapter 3.3.2 --- Performance Study in Intra- and Inter- Query Feedback . --- p.40Chapter 3.4 --- Conclusion --- p.42Chapter 4 --- Distributed COntent-based Visual Information Retrieval --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Peer Clustering --- p.45Chapter 4.2.1 --- Basic Version --- p.45Chapter 4.2.2 --- Single Cluster Version --- p.47Chapter 4.2.3 --- Multiple Clusters Version --- p.51Chapter 4.3 --- Firework Query Model --- p.53Chapter 4.4 --- Implementation and System Architecture --- p.57Chapter 4.4.1 --- Gnutella Message Modification --- p.57Chapter 4.4.2 --- Architecture of DISCOVIR --- p.59Chapter 4.4.3 --- Flow of Operations --- p.60Chapter 4.5 --- Experiments --- p.62Chapter 4.5.1 --- Simulation Model of the Peer-to-Peer Network --- p.62Chapter 4.5.2 --- Number of Peers --- p.66Chapter 4.5.3 --- TTL of Query Message --- p.70Chapter 4.5.4 --- Effects of Data Resolution on Query Efficiency --- p.73Chapter 4.5.5 --- Discussion --- p.74Chapter 4.6 --- Conclusion --- p.77Chapter 5 --- Future Works and Conclusion --- p.79Chapter A --- Derivation of Update Equation --- p.81Chapter B --- An Efficient Discovery of Signatures --- p.82Bibliography --- p.8

    Extending information retrieval system model to improve interactive web searching.

    Get PDF
    The research set out with the broad objective of developing new tools to support Web information searching. A survey showed that a substantial number of interactive search tools were being developed but little work on how these new developments fitted into the general aim of helping people find information. Due to this it proved difficult to compare and analyse how tools help and affect users and where they belong in a general scheme of information search tools. A key reason for a lack of better information searching tools was identified in the ill-suited nature of existing information retrieval system models. The traditional information retrieval model is extended by synthesising work in information retrieval and information seeking research. The purpose of this new holistic search model is to assist information system practitioners in identifying, hypothesising, designing and evaluating Web information searching tools. Using the model, a term relevance feedback tool called ‘Tag and Keyword’ (TKy) was developed in a Web browser and it was hypothesised that it could improve query reformulation and reduce unnecessary browsing. The tool was laboratory experimented and quantitative analysis showed statistical significances in increased query reformulations and in reduced Web browsing (per query). Subjects were interviewed after the experiment and qualitative analysis revealed that they found the tool useful and saved time. Interestingly, exploratory analysis on collected data identified three different methods in which subjects had utilised the TKy tool. The research developed a holistic search model for Web searching and demonstrated that it can be used to hypothesise, design and evaluate information searching tools. Information system practitioners using it can better understand the context in which their search tools are developed and how these relate to users’ search processes and other search tools

    Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval

    Full text link
    Query Expansion using Pseudo Relevance Feedback is a useful and a popular technique for reformulating the query. In our proposed query expansion method, we assume that relevant information can be found within a document near the central idea. The document is normally divided into sections, paragraphs and lines. The proposed method tries to extract keywords that are closer to the central theme of the document. The expansion terms are obtained by equi-frequency partition of the documents obtained from pseudo relevance feedback and by using tf-idf scores. The idf factor is calculated for number of partitions in documents. The group of words for query expansion is selected using the following approaches: the highest score, average score and a group of words that has maximum number of keywords. As each query behaved differently for different methods, the effect of these methods in selecting the words for query expansion is investigated. From this initial study, we extend the experiment to develop a rule-based statistical model that automatically selects the best group of words incorporating the tf-idf scoring and the 3 approaches explained here, in the future. The experiments were performed on FIRE 2011 Adhoc Hindi and English test collections on 50 queries each, using Terrier as retrieval engine
    corecore