20,148 research outputs found

    Relevance feedback for best match term weighting algorithms in information retrieval

    Get PDF
    Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the first based on statistical language models and the second based on the binary independence probabilistic model. The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms on the TREC collection. The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback

    A probabilistic justification for using tf.idf term weighting in information retrieval

    Get PDF
    This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf.idf term weighting. The paper shows that the new probabilistic interpretation of tf.idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the TREC collection shows that the linguistically motivated weighting algorithm outperforms the popular BM25 weighting algorithm

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Vocal Access to a Newspaper Archive: Design Issues and Preliminary Investigation

    Get PDF
    This paper presents the design and the current prototype implementation of an interactive vocal Information Retrieval system that can be used to access articles of a large newspaper archive using a telephone. The results of preliminary investigation into the feasibility of such a system are also presented

    Information Retrieval Models

    Get PDF
    Many applications that handle information on the internet would be completely\ud inadequate without the support of information retrieval technology. How would\ud we find information on the world wide web if there were no web search engines?\ud How would we manage our email without spam filtering? Much of the development\ud of information retrieval technology, such as web search engines and spam\ud filters, requires a combination of experimentation and theory. Experimentation\ud and rigorous empirical testing are needed to keep up with increasing volumes of\ud web pages and emails. Furthermore, experimentation and constant adaptation\ud of technology is needed in practice to counteract the effects of people that deliberately\ud try to manipulate the technology, such as email spammers. However,\ud if experimentation is not guided by theory, engineering becomes trial and error.\ud New problems and challenges for information retrieval come up constantly.\ud They cannot possibly be solved by trial and error alone. So, what is the theory\ud of information retrieval?\ud There is not one convincing answer to this question. There are many theories,\ud here called formal models, and each model is helpful for the development of\ud some information retrieval tools, but not so helpful for the development others.\ud In order to understand information retrieval, it is essential to learn about these\ud retrieval models. In this chapter, some of the most important retrieval models\ud are gathered and explained in a tutorial style

    Disambiguation strategies for cross-language information retrieval

    Get PDF
    This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Entity Ranking on Graphs: Studies on Expert Finding

    Get PDF
    Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models
    corecore