5 research outputs found

    Parallelising Harvesting

    Full text link

    Parallelising Harvesting

    Get PDF
    Metadata harvesting has become a common technique to transfer a stream of data from one metadata repository or digital library system to another. As collections of metadata, and their associated digital objects, grow in size, the ingest of these items at the destination archive can take a significant amount of time, depending on the type of indexing or post-processing that is required. This paper discusses an approach to parallelise the post-processing of data in a small cluster of machines or a multi-processor environment, while not increasing the burden on the source data provider. Performance tests have been carried out on varying architectures and the results indicate that this technique is indeed promising for some scenarios and can be extended to more computationally-intensive ingest procedures. In general, the technique presents a new approach for the construction of harvest-based distributed or component-based digital libraries, with better scalability than before

    Utility-based high performance digital library systems

    Get PDF
    Many practical digital library systems have had to deal with scalability of data collections and/or service provision. Early attempts at enabling this scalability focused on data/services closely coupled with or tightly integrated with various high performance computing platforms. This inevitably resulted in compromises and very specific solutions. This paper presents an analysis of current high performance systems and motivates for why utility computing can subsume existing models and better meet the needs of generic scalable digital library systems

    Lightweight component-based scalability

    Get PDF
    Digital libraries and information management systems are increasingly being developed according to component models with well-defined APIs and often with Web-accessible interfaces. In parallel with metadata access and harvesting, Web 2.0 mashups have demonstrated the flexibility of developing systems as independent distributed components. It can be argued that such distributed components also can be an enabler for scalability of service provision in medium to large systems. To test this premise, this article discusses how an existing component framework was modified to include support for scalability. A set of lightweight services and extensions were created to migrate and replicate services as the load changes. Experiments with the prototype system confirm that this system can in fact be quite effective as an enabler of transparent and efficient scalability, without the need to resort to complex middleware or substantial system reengineering. Finally, specific problems areas have been identified as future avenues for exploration at the crucial intersection of digital libraries and high-performance computing
    corecore