12,499 research outputs found

    Using Semantic Technologies in Digital Libraries- A Roadmap to Quality Evaluation

    Get PDF
    Abstract. In digital libraries semantic techniques are often deployed to reduce the expensive manual overhead for indexing documents, maintaining metadata, or caching for future search. However, using such techniques may cause a decrease in a collection’s quality due to their statistical nature. Since data quality is a major concern in digital libraries, it is important to be able to measure the (loss of) quality of metadata automatically generated by semantic techniques. In this paper we present a user study based on a typical semantic technique use

    Towards improved performance and interoperability in distributed and physical union catalogues

    Get PDF
    Purpose of this paper: This paper details research undertaken to determine the key differences in the performance of certain centralised (physical) and distributed (virtual) bibliographic catalogue services, and to suggest strategies for improving interoperability and performance in, and between, physical and virtual models. Design/methodology/approach: Methodically defined searches of a centralised catalogue service and selected distributed catalogues were conducted using the Z39.50 information retrieval protocol, allowing search types to be semantically defined. The methodology also entailed the use of two workshops comprising systems librarians and cataloguers to inform suggested strategies for improving performance and interoperability within both environments. Findings: Technical interoperability was permitted easily between centralised and distributed models, however the various individual configurations permitted only limited semantic interoperability. Significant prescription in cataloguing and indexing guidelines, greater participation in the Program for Collaborative Cataloging (PCC), consideration of future 'FRBR' migration, and greater disclosure to end users are some of the suggested strategies to improve performance and semantic interoperability. Practical implications: This paper informs the LIS research community and union catalogue administrators, but also has numerous practical implications for those establishing distributed systems based on Z39.50 and SRW, as well as those establishing centralised systems. What is original/value of the paper?: The paper moves the discussion of Z39.50 based systems away from anecdotal evidence and provides recommendations based on testing and is intimately informed by the UK cataloguing and systems librarian community

    Fusion architectures for automatic subject indexing under concept drift:Analysis and empirical results on short texts

    Get PDF
    Indexing documents with controlled vocabularies enables a wealth of semantic applications for digital libraries. Due to the rapid growth of scientific publications, machine learning-based methods are required that assign subject descriptors automatically. While stability of generative processes behind the underlying data is often assumed tacitly, it is being violated in practice. Addressing this problem, this article studies explicit and implicit concept drift, that is, settings with new descriptor terms and new types of documents, respectively. First, the existence of concept drift in automatic subject indexing is discussed in detail and demonstrated by example. Subsequently, architectures for automatic indexing are analyzed in this regard, highlighting individual strengths and weaknesses. The results of the theoretical analysis justify research on fusion of different indexing approaches with special consideration on information sharing among descriptors. Experimental results on titles and author keywords in the domain of economics underline the relevance of the fusion methodology, especially under concept drift. Fusion approaches outperformed non-fusion strategies on the tested data sets, which comprised shifts in priors of descriptors as well as covariates. These findings can help researchers and practitioners in digital libraries to choose appropriate methods for automatic subject indexing, as is finally shown by a recent case study

    Cross-concordances: terminology mapping and its effectiveness for information retrieval

    Get PDF
    The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

    Visual Information Retrieval in Digital Libraries

    Get PDF
    The emergence of information highways and multimedia computing has resulted in redefining the concept of libraries. It is widely believed that in the next few years, a significant portion of information in libraries will be in the form of multimedia electronic documents. Many approaches are being proposed for storing, retrieving, assimilating, harvesting, and prospecting information from these multimedia documents. Digital libraries are expected to allow users to access information independent of the locations and types of data sources and will provide a unified picture of information. In this paper, we discuss requirements of these emerging information systems and present query methods and data models for these systems. Finally, we briefly present a few examples of approaches that provide a preview of how things will be done in the digital libraries in the near future.published or submitted for publicatio

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio
    • 

    corecore