8,838 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
Mediation of Lazy Update Propagation in a Replicated Database over a Decentralized P2P Architecture
While replicating data over a decentralized Peer-to- Peer (P2P) network, transactions broadcasting updates arising from different peers run simultaneously so that a destination peer replica can be updated concurrently, that always causes transaction and data conflicts. Moreover, during data migration, connectivity interruption and network overload corrupt running transactions so that destination peers can experience duplicated data or improper data or missing data, hence replicas remain inconsistent. Different methodological approaches have been combined to solve these problems: the audit log technique to capture the changes made to data; the algorithmic method to design and analyse algorithms and the statistical method to analyse the performance of new algorithms and to design prediction models of the execution time based on other parameters. A Graphical User Interface software as prototype, have been designed with C #, to implement these new algorithms to obtain a database synchronizer-mediator. A stream of experiments, showed that the new algorithms were effective. So, the hypothesis according to which 201C;The execution time of replication and reconciliation transactions totally depends on independent factors.201D; has been confirmed
- …