2,006 research outputs found

    Federating Heterogeneous Digital Libraries by Metadata Harvesting

    Get PDF
    This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs) by metadata harvesting. The objective of federation is to provide high-level services (e.g. transparent search across all DLs) on the collective metadata from different digital libraries. There are two main approaches to federate DLs: distributed searching approach and harvesting approach. As the distributed searching approach replies on executing queries to digital libraries in real time, it has problems with scalability. The difficulty of creating a distributed searching service for a large federation is the motivation behind Open Archives Initiatives Protocols for Metadata Harvesting (OAI-PMH). OAI-PMH supports both data providers (repositories, archives) and service providers. Service providers develop value-added services based on the information collected from data providers. Data providers are simply collections of harvestable metadata. This dissertation examines the application of the metadata harvesting approach in DL federations. It addresses the following problems: (1) Whether or not metadata harvesting provides a realistic and scalable solution for DL federation. (2) What is the status of and problems with current data provider implementations, and how to solve these problems. (3) How to synchronize data providers and service providers. (4) How to build different types of federation services over harvested metadata. (5) How to create a scalable and reliable infrastructure to support federation services. The work done in this dissertation is based on OAI-PMH, and the results have influenced the evolution of OAI-PMH. However, the results are not limited to the scope of OAI-PMH. Our approach is to design and build key services for metadata harvesting and to deploy them on the Web. Implementing a publicly available service allows us to demonstrate how these approaches are practical. The problems posed above are evaluated by performing experiments over these services. To summarize the results of this thesis, we conclude that the metadata harvesting approach is a realistic and scalable approach to federate heterogeneous DLs. We present two models of building federation services: a centralized model and a replicated model. Our experiments also demonstrate that the repository synchronization problem can be addressed by push, pull, and hybrid push/pull models; each model has its strengths and weaknesses and fits a specific scenario. Finally, we present a scalable and reliable infrastructure to support the applications of metadata harvesting

    SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long Term Data Preservation

    Get PDF
    Major research universities are grappling with their response to the deluge of scientific data emerging through research by their faculty. Many are looking to their libraries and the institutional repository as a solution. Scientific data introduces substantial challenges that the document-based institutional repository may not be suited to deal with. The Sustainable Environment - Actionable Data (SEAD) Virtual Archive specifically addresses the challenges of ā€œlong tailā€ scientific data. In this paper, we propose requirements, policy and architecture to support not only the preservation of scientific data today using institutional repositories, but also its rich access and use into the future

    Optimising metadata workflows in a distributed information environment

    Get PDF
    The different purposes present within a distributed information environment create the potential for repositories to enhance their metadata by capitalising on the diversity of metadata available for any given object. This paper presents three conceptual reference models required to achieve this optimisation of metadata workflow: the ecology of repositories, the object lifecycle model, and the metadata lifecycle model. It suggests a methodology for developing the metadata lifecycle model, and illustrates how it might be used to enhance metadata within a network of repositories and services

    Servicing the federation : the case for metadata harvesting

    Get PDF
    The paper presents a comparative analysis of data harvesting and distributed computing as complementary models of service delivery within large-scale federated digital libraries. Informed by requirements of flexibility and scalability of federated services, the analysis focuses on the identification and assessment of model invariants. In particular, it abstracts over application domains, services, and protocol implementations. The analytical evidence produced shows that the harvesting model offers stronger guarantees of satisfying the identified requirements. In addition, it suggests a first characterisation of services based on their suitability to either model and thus indicates how they could be integrated in the context of a single federated digital library

    Target and (Astro-)WISE technologies - Data federations and its applications

    Full text link
    After its first implementation in 2003 the Astro-WISE technology has been rolled out in several European countries and is used for the production of the KiDS survey data. In the multi-disciplinary Target initiative this technology, nicknamed WISE technology, has been further applied to a large number of projects. Here, we highlight the data handling of other astronomical applications, such as VLT-MUSE and LOFAR, together with some non-astronomical applications such as the medical projects Lifelines and GLIMPS, the MONK handwritten text recognition system, and business applications, by amongst others, the Target Holding. We describe some of the most important lessons learned and describe the application of the data-centric WISE type of approach to the Science Ground Segment of the Euclid satellite.Comment: 9 pages, 5 figures, Proceedngs IAU Symposium No 325 Astroinformatics 201

    Lessons Learned with Arc, an OAI-PMH Service Provider

    Get PDF
    Web-based digital libraries have historically been built in isolation utilizing different technologies, protocols, and metadata. These differences hindered the development of digital library services that enable users to discover information from multiple libraries through a single unified interface. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a major, international effort to address technical interoperability among distributed repositories. Arc debuted in 2000 as the first end-user OAI-PMH service provider. Since that time, Arc has grown to include nearly 7,000,000 metadata records. Arc has been deployed in a number of environments and has served as the basis for many other OAI-PMH projects, including Archon, Kepler, NCSTRL, and DP9. In this article we review the history of OAI-PMH and Arc, as well as some of the lessons learned while developing Arc and related OAI-PMH services. Reprinted by permission of the publisher

    Digital Library Federation (DLF) Aquifer Project

    Get PDF
    In January 2005, the Digital Library Federation (DLF) renewed its commitment to developing a distributed open digital library by assigning a full time project director to lead the initiative. Called DLF Aquifer to symbolize the pooling of content into a community resource, and the piping or siphoning of content to meet specific needs, the collaboration amongst a subset of DLF member libraries has produced standards, reports, and prototypes over the past year. The work was accomplished through four working groups focused on services, technology/architecture, metadata, and collections, with additional participation by library staff at DLF Aquifer institutions. One prototype was created through a collaboration that extended beyond DLF membership. This project update highlights "Phase I" DLF Aquifer deliverables and provides a brief overview of future plans. Additional details, including the DLF Aquifer business plan, working group rosters, and mode of operation can be found on the DLF Aquifer web site. A list of participating institutions is included here, as an appendix

    Contexts and Contributions: Building the Distributed Library

    Get PDF
    This report updates and expands on A Survey of Digital Library Aggregation Services, originally commissioned by the DLF as an internal report in summer 2003, and released to the public later that year. It highlights major developments affecting the ecosystem of scholarly communications and digital libraries since the last survey and provides an analysis of OAI implementation demographics, based on a comparative review of repository registries and cross-archive search services. Secondly, it reviews the state-of-practice for a cohort of digital library aggregation services, grouping them in the context of the problem space to which they most closely adhere. Based in part on responses collected in fall 2005 from an online survey distributed to the original core services, the report investigates the purpose, function and challenges of next-generation aggregation services. On a case-by-case basis, the advances in each service are of interest in isolation from each other, but the report also attempts to situate these services in a larger context and to understand how they fit into a multi-dimensional and interdependent ecosystem supporting the worldwide community of scholars. Finally, the report summarizes the contributions of these services thus far and identifies obstacles requiring further attention to realize the goal of an open, distributed digital library system

    Federated Searching Interface Techniques for Heterogeneous OAI Repositories

    Get PDF
    Federating repositories by harvesting heterogeneous collections with varying degrees of metadata richness poses a number of challenging issues: (1) how to address the lack of uniform control for various metadata fields in terms of building a rich unified search interface, and (2) how easily new collections and freshly harvested data in existing repositories can be incorporated into the federation supporting a unified interface? This paper focuses on the approaches taken to address these issues in Arc, an Open Archives Initiative compliant federated digital library. At present Arc contains over 1M metadata records from 75 data providers from various subject domains. Analysis of these heterogeneous collections indicates that controlled vocabularies and values are widely used in most repositories. Usage is extremely variable, however. In Arc we solve the problem by implementing an advanced searching interface that allows users to search and select in specific fields with data we construct from the harvested metadata, and also by an interactive search for the subject field. As the metadata records are incrementally harvested we address how to build these services over frequently added new collections and harvested data. The initial result is promising, showing the benefits of immediate feedback to the user in enhancing the search experience as well as in increasing the precision of the user\u27s search

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented
    • ā€¦
    corecore