3,507 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Servicing the federation : the case for metadata harvesting

    Get PDF
    The paper presents a comparative analysis of data harvesting and distributed computing as complementary models of service delivery within large-scale federated digital libraries. Informed by requirements of flexibility and scalability of federated services, the analysis focuses on the identification and assessment of model invariants. In particular, it abstracts over application domains, services, and protocol implementations. The analytical evidence produced shows that the harvesting model offers stronger guarantees of satisfying the identified requirements. In addition, it suggests a first characterisation of services based on their suitability to either model and thus indicates how they could be integrated in the context of a single federated digital library

    A Generic Alerting Service for Digital Libraries

    Get PDF
    Users of modern digital libraries (DLs) can keep themselves up-to-date by searching and browsing their favorite collections, or more conveniently by resorting to an alerting service. The alerting service notifies its clients about new or changed documents. Proprietary and mediating alerting services fail to fluidly integrate information from differing collections. This paper analyses the conceptual requirements of this much-sought after service for digital libraries. We demonstrate that the differing concepts of digital libraries and its underlying technical design has extensive influence (a) the expectations, needs and interests of users regarding an alerting service, and (b) on the technical possibilities of the implementation of the service. Our findings will show that the range of issues surrounding alerting services for digital libraries, their design and use is greater than one may anticipate. We also show that, conversely, the requirements for an alerting service have considerable impact on the concepts of DL design. Our findings should be of interest for librarians as well as system designers. We highlight and discuss the far-reaching implications for the design of, and interaction with, libraries. This paper discusses the lessons learned from building such a distributed alerting service. We present our prototype implementation as a proof-of-concept for an alerting service for open DL software

    SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long Term Data Preservation

    Get PDF
    Major research universities are grappling with their response to the deluge of scientific data emerging through research by their faculty. Many are looking to their libraries and the institutional repository as a solution. Scientific data introduces substantial challenges that the document-based institutional repository may not be suited to deal with. The Sustainable Environment - Actionable Data (SEAD) Virtual Archive specifically addresses the challenges of “long tail” scientific data. In this paper, we propose requirements, policy and architecture to support not only the preservation of scientific data today using institutional repositories, but also its rich access and use into the future

    Prototyping Incentive-based Resource Assignment for Clouds in Community Networks

    Get PDF
    Wireless community networks are a successful example of a collective where communities operate ICT infrastructure and provide IP connectivity based on the principle of reciprocal resource sharing of network bandwidth. This sharing, however, has not extended to computing and storage resources, resulting in very few applications and services which are currently deployed within community networks. Cloud computing, as in today's Internet, has made it common to consume resources provided by public clouds providers, but such cloud infrastructures have not materialized within community networks. We analyse in this paper socio-technical characteristics of community networks in order to derive scenarios for community clouds. Based on an architecture for such a community cloud, we implement a prototype for the incentive-driven resource assignment component, deploy it in a testbed of community network nodes, and evaluate its behaviour experimentally. Our evaluation gives insight into how the deployed prototype components regulate the consumption of cloud resources taking into account the users' contributions, and how this regulation affects the system usage. Our results suggest a further integration of this regulation component into current cloud management platforms in order to open them up for the operation of an ecosystem of community cloud

    A Query Integrator and Manager for the Query Web

    Get PDF
    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
    corecore