11,419 research outputs found

    Probabilistic ranking in the local search

    Get PDF
    This paper presents a development and an implementation of probabilistic methods of information retrieval in blogs. The actual model unites a linguistically motivated probabilistic model of information retrieval for estimating similarity of documents and query and latent semantic indexing aimed for retrieving latent factors on count data. Definition of semantic extension is based on relevance feedback. The implementation of given model implies a development of the search plugin for a blog

    ON THE USE OF THE DEMPSTER SHAFER MODEL IN INFORMATION INDEXING AND RETRIEVAL APPLICATIONS

    Get PDF
    The Dempster Shafer theory of evidence concerns the elicitation and manipulation of degrees of belief rendered by multiple sources of evidence to a common set of propositions. Information indexing and retrieval applications use a variety of quantitative means - both probabilistic and quasi-probabilistic - to represent and manipulate relevance numbers and index vectors. Recently, several proposals were made to use the Dempster Shafes model as a relevance calculus in such applications. The paper provides a critical review of these proposals, pointing at several theoretical caveats and suggesting ways to resolve them. The methodology is based on expounding a canonical indexing model whose relevance measures and combination mechanisms are shown to be isomorphic to Shafer's belief functions and to Dempster's rule, respectively. Hence, the paper has two objectives: (i) to describe and resolve some caveats in the way the Dempster Shafer theory is applied to information indexing and retrieval, and (ii) to provide an intuitive interpretation of the Dempster Shafer theory, as it unfolds in the simple context of a canonical indexing model.Information Systems Working Papers Serie

    Information Retrieval Models

    Get PDF
    Many applications that handle information on the internet would be completely\ud inadequate without the support of information retrieval technology. How would\ud we find information on the world wide web if there were no web search engines?\ud How would we manage our email without spam filtering? Much of the development\ud of information retrieval technology, such as web search engines and spam\ud filters, requires a combination of experimentation and theory. Experimentation\ud and rigorous empirical testing are needed to keep up with increasing volumes of\ud web pages and emails. Furthermore, experimentation and constant adaptation\ud of technology is needed in practice to counteract the effects of people that deliberately\ud try to manipulate the technology, such as email spammers. However,\ud if experimentation is not guided by theory, engineering becomes trial and error.\ud New problems and challenges for information retrieval come up constantly.\ud They cannot possibly be solved by trial and error alone. So, what is the theory\ud of information retrieval?\ud There is not one convincing answer to this question. There are many theories,\ud here called formal models, and each model is helpful for the development of\ud some information retrieval tools, but not so helpful for the development others.\ud In order to understand information retrieval, it is essential to learn about these\ud retrieval models. In this chapter, some of the most important retrieval models\ud are gathered and explained in a tutorial style

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    Miracle’s 2005 Approach to Monolingual Information Retrieval

    Full text link
    This paper presents the 2005 Miracle’s team approach to Monolingual Information Retrieval. The goal for the experiments in this year was twofold: continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extracting, paragraph extracting, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query

    The relationship between IR and multimedia databases

    Get PDF
    Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system

    Probabilistic models of information retrieval based on measuring the divergence from randomness

    Get PDF
    We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model

    Towards an Information Retrieval Theory of Everything

    Get PDF
    I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google's page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving others. So, essentially each model solves part of a bigger puzzle, and a unified view on these models might be a first step towards an Information Retrieval Theory of Everything
    • 

    corecore