8,838 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Partout: A Distributed Engine for Efficient RDF Processing

    Full text link
    The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for efficient RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log, allocating the fragments to nodes in a cluster, and finding the optimal configuration. Partout can efficiently handle updates and its query optimizer produces efficient query execution plans for ad-hoc SPARQL queries. Our experiments show the superiority of our approach to state-of-the-art approaches for partitioning and distributed SPARQL query processing

    Mediation of Lazy Update Propagation in a Replicated Database over a Decentralized P2P Architecture

    Get PDF
    While replicating data over a decentralized Peer-to- Peer (P2P) network, transactions broadcasting updates arising from different peers run simultaneously so that a destination peer replica can be updated concurrently, that always causes transaction and data conflicts. Moreover, during data migration, connectivity interruption and network overload corrupt running transactions so that destination peers can experience duplicated data or improper data or missing data, hence replicas remain inconsistent. Different methodological approaches have been combined to solve these problems: the audit log technique to capture the changes made to data; the algorithmic method to design and analyse algorithms and the statistical method to analyse the performance of new algorithms and to design prediction models of the execution time based on other parameters. A Graphical User Interface software as prototype, have been designed with C #, to implement these new algorithms to obtain a database synchronizer-mediator. A stream of experiments, showed that the new algorithms were effective. So, the hypothesis according to which 201C;The execution time of replication and reconciliation transactions totally depends on independent factors.201D; has been confirmed
    • …
    corecore