36,892 research outputs found
Using Language Models for Information Retrieval
Because of the world wide web, information retrieval systems are now used by millions of untrained users all over the world. The search engines that perform the information retrieval tasks, often retrieve thousands of potentially interesting documents to a query. The documents should be ranked in decreasing order of relevance in order to be useful to the user. This book describes a mathematical model of information retrieval based on the use of statistical language models. The approach uses simple document-based unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents. The study makes the following research contributions. * The development of a model that integrates term weighting, relevance feedback and structured queries. * The development of a model that supports multiple representations of a request or information need by integrating a statistical translation model. * The development of a model that supports multiple representations of a document, for instance by allowing proximity searches or searches for terms from a particular record field (e.g. a search for terms from the title). * A mathematical interpretation of stop word removal and stemming. * A mathematical interpretation of operators for mandatory terms, wildcards and synonyms. * A practical comparison of a language model-based retrieval system with similar systems that are based on well-established models and term weighting algorithms in a controlled experiment. * The application of the model to cross-language information retrieval and adaptive information filtering, and the evaluation of two prototype systems in a controlled experiment. Experimental results on three standard tasks show that the language model-based algorithms work as well as, or better than, today's top-performing retrieval algorithms. The standard tasks investigated are ad-hoc retrieval (when there are no previously retrieved documents to guide the search), retrospective relevance weighting (find the optimum model for a given set of relevant documents), and ad-hoc retrieval using manually formulated Boolean queries. The application to cross-language retrieval and adaptive filtering shows the practical use of respectively structured queries, and relevance feedback
Overview of the 2005 cross-language image retrieval track (ImageCLEF)
The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore
the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings
Formal models, usability and related work in IR (editorial for special edition)
The Glasgow IR group has carried out both theoretical and empirical work, aimed at giving end users efficient and effective access to large collections of multimedia data
Machine Learning of User Profiles: Representational Issues
As more information becomes available electronically, tools for finding
information of interest to users becomes increasingly important. The goal of
the research described here is to build a system for generating comprehensible
user profiles that accurately capture user interest with minimum user
interaction. The research described here focuses on the importance of a
suitable generalization hierarchy and representation for learning profiles
which are predictively accurate and comprehensible. In our experiments we
evaluated both traditional features based on weighted term vectors as well as
subject features corresponding to categories which could be drawn from a
thesaurus. Our experiments, conducted in the context of a content-based
profiling system for on-line newspapers on the World Wide Web (the IDD News
Browser), demonstrate the importance of a generalization hierarchy and the
promise of combining natural language processing techniques with machine
learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
An evaluation resource for geographic information retrieval
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation
Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource
encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic
information retrieval requires an evaluation resource which represents realistic information needs and which is geographically
challenging. Some experimental results and analysis are reported
- …