1,085 research outputs found

    PENG: integrated search of distributed news archives

    Get PDF
    The PENG system is intended to provide an integrated and personalized environment for news professionals, providing functionalities for filtering, distributed retrieval, and a flexible interface environment for the display and manipulation of news materials. In this paper we review the progress and results of the PENG system to date, and describe in detail the document filtering part of the system, which is designed to gather and filter documents to user profiles. The current architecture will be described, along with some of the main issues which have so far been found in it's development

    Towards better measures: evaluation of estimated resource description quality for distributed IR

    Get PDF
    An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work

    An evaluation of resource description quality measures

    Get PDF
    An open problem for Distributed Information Retrieval is how to represent large document repositories (known as resources) efficiently. To facilitate resource selection, estimated descriptions of each resource are required, especially when faced with non-cooperative distributed environments. Accurate and efficient Resource description estimation is required as this can have an affect on resource selection, and as a consequence retrieval quality. Query-Based Sampling (QBS) has been proposed as a novel solution for resource estimation, with proceeding techniques developed therafter. However, the challenge to determine if one QBS technique is better at generating resource description than another is still an unresolved issue. The initial metrics tested and deployed for measuring resource description quality were the Collection Term Frequency ratio (CTF) and Spearman Rank Correlation Coefficient (SRCC). The former provides an indication of the percentage of terms seen, whilst the later measures the term ranking order, although neither consider the term frequency, which is important for resource selection. We re-examine this problem and consider measuring the quality of a resource description in context to resource selection, where an estimate of the probability of a term given the resource is typically required. We believe a natural measure for comparing the estimated resource against the actual resource is the Kullback-Leibler Divergence (KL) measure. KL addresses the concerns put forward previously, by not over-representing low frequency terms, and also considering term order. In this paper, we re-assess the two previous measures alongside KL. Our preliminary investigation revealed that the former metrics display contradictory results. Whilst, KL suggested a different QBS technique than that prescribed in, would provide better estimates. This is a significant result, because it now remains unclear as to which technique will consistently provide better resource descriptions. The remainder of this paper details the three measures, the experimental analysis of our preliminary study and outlines our points of concern along with further research directions

    Adaptive query-based sampling for distributed IR

    Get PDF
    No abstract available

    Sense resolution properties of logical imaging

    Get PDF
    The evaluation of an implication by Imaging is a logical technique developed in the framework of modal logic. Its interpretation in the context of a “possible worlds” semantics is very appealing for IR. In 1994, Crestani and Van Rijsbergen proposed an interpretation of Imaging in the context of IR based on the assumption that “a term is a possibleworld”. This approach enables the exploitation of term– term relationshipswhich are estimated using an information theoretic measure. Recent analysis of the probability kinematics of Logical Imaging in IR have suggested that this technique has some interesting sense resolution properties. In this paper we will present this new line of research and we will relate it to more classical research into word senses

    The troubles with using a logical model of IR on a large collection of documents

    Get PDF
    This is a paper of two halves. First, a description of a logical model of IR known as imaging will be presented. Unfortunately due to constraints of time and computing resource this model was not implemented in time for this round of TREC. Therefore this paper's second half describes the more conventional IR model and system used to generate the Glasgow IR result set (glair1)

    Logical and uncertainty models for information access: current trends

    Get PDF
    The current trends of research in information access as emerged from the 1999 Workshop on Logical and Uncertainty Models for Information Systems (LUMIS'99) are briefly reviewed in this paper. We believe that some of these issues will be central to future research on theory and applications of logical and uncertainty models for information access

    A multi-layered Bayesian network model for structured document retrieval

    Get PDF
    New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

    Retrieval of Spoken Documents: First Experiences (Research Report TR-1997-34)

    Get PDF
    We report on our first experiences in dealing with the retrieval of spoken documents. While lacking the tools and know-how for performing speech recognition on the spoken documents, we tried to use in the best possible way our knowledge of probabilistic indexing and retrieval of textual documents. The techniques we used and the results we obtained are encouraging, motivating our future involvement in other further experimentation in this new area of research

    The troubles with using a logical model of IR on a large collection of documents

    Get PDF
    This is a paper of two halves. First, a description of a logical model of IR known as imaging will be presented. Unfortunately due to constraints of time and computing resource this model was not implemented in time for this round of TREC. Therefore this paper’s second half describes the more conventional IR model and system used to generate the Glasgow IR result set (glair1)
    • …
    corecore