21,332 research outputs found

    PARTIAL COORDINATION: A PRELIMINARY EVALUATION AND FAILURE ANALYSIS

    Get PDF
    Partial coordination is a new method for cataloging documents for subject access. It is especially designed to enhance the precision of document searches in online environments. This paper reports a preliminary evaluation of partial coordination which shows promising results compared with full text retrieval. We also report the difficulties in empirically evaluating the effectiveness of automatic full-text retrieval in contrast to mixed methods such as partial coordination which combine human cataloging with computerized retrieval. Based on our study we propose research in this area will substantially benefit from a common framework for failure analysis and a common data set. This will allow information retrieval researchers adapting "library style" cataloging to large electronic document collections, as well as those developing automated or mixed methods, to directly compare their proposals for indexing and retrieval. This paper concludes by suggesting guidelines for constructing such a testbed.Information Systems Working Papers Serie

    PARTIAL COORDINATION: A PRELIMINARY EVALUATION AND FAILURE ANALYSIS

    Get PDF
    Partial coordination is a new method for cataloging documents for subject access. It is especially designed to enhance the precision of document searches in online environments. This paper reports a preliminary evaluation of partial coordination which shows promising results compared with full text retrieval. We also report the difficulties in empirically evaluating the effectiveness of automatic full-text retrieval in contrast to mixed methods such as partial coordination which combine human cataloging with computerized retrieval. Based on our study we propose research in this area will substantially benefit from a common framework for failure analysis and a common data set. This will allow information retrieval researchers adapting "library style" cataloging to large electronic document collections, as well as those developing automated or mixed methods, to directly compare their proposals for indexing and retrieval. This paper concludes by suggesting guidelines for constructing such a testbed.Information Systems Working Papers Serie

    Full-Text Retrieval: Systems and Files

    Get PDF
    Much of the development in the first 30 years of library automation has been in solving the problem of identifying relevant sources. Automation of the library\u27s card catalog provides a finding tool for the library\u27s collections. The books, journals, films, and other materials located through the catalog still mostly reside in their original form, with no direct connection to the automated finding tool. Most of the early development in electronic publishing was also aimed solely at identifying information sources. Secondary publishers, notably publishers of indexing/abstracting serials, were the first to provide their resources in electronic form. Throughout the 1970s and much of the 1980s, indexing/abstracting (bibliographic) databases were predominant in the online database world. The first CD-ROM databases for libraries were many of these same bibliographic files. Traditionally (and still) bibliographic databases are the most widely used type of electronic resource in libraries. Starting in the mid-1980s, due to great increases in disk storage capacities and better document conversion techniques, full texts of certain types of documents became more widely available. In the 1990s,full-text databases (files) are the most rapidly growing type of commercially available database. Better text-retrieval software is leading to more locally created full-text databases as well. Perhaps in this decade we will at last electronically solve the document delivery problem as well as the document location problem

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    The Computer as a Tool for Legal Research

    Get PDF

    Automated speech and audio analysis for semantic access to multimedia

    Get PDF
    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

    Weak signal identification with semantic web mining

    Get PDF
    We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization's environment and that they appear in different contexts. We use internet information to represent organization's environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time
    corecore