5,608 research outputs found
A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance
This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics
Distributed Semantic Web Data Management in HBase and MySQL Cluster
Various computing and data resources on the Web are being enhanced with
machine-interpretable semantic descriptions to facilitate better search,
discovery and integration. This interconnected metadata constitutes the
Semantic Web, whose volume can potentially grow the scale of the Web. Efficient
management of Semantic Web data, expressed using the W3C's Resource Description
Framework (RDF), is crucial for supporting new data-intensive,
semantics-enabled applications. In this work, we study and compare two
approaches to distributed RDF data management based on emerging cloud computing
technologies and traditional relational database clustering technologies. In
particular, we design distributed RDF data storage and querying schemes for
HBase and MySQL Cluster and conduct an empirical comparison of these approaches
on a cluster of commodity machines using datasets and queries from the Third
Provenance Challenge and Lehigh University Benchmark. Our study reveals
interesting patterns in query evaluation, shows that our algorithms are
promising, and suggests that cloud computing has a great potential for scalable
Semantic Web data management.Comment: In Proc. of the 4th IEEE International Conference on Cloud Computing
(CLOUD'11
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Distributed Semantic Web data management in HBase and MySQL cluster
Various computing and data resources on the Web are being enhanced with machine-interpretable semantic descriptions to facilitate better search, discovery and integration. This interconnected metadata constitutes the Semantic Web, whose volume can potentially grow the scale of the Web. Efficient management of Semantic Web data, expressed using the W3C\u27s Resource Description Framework (RDF), is crucial for supporting new data-intensive, semantics-enabled applications. In this work, we study and compare two approaches to distributed RDF data management based on emerging cloud computing technologies and traditional relational database clustering technologies. In particular, we design distributed RDF data storage and querying schemes for HBase and MySQL Cluster and conduct an empirical comparison of these approaches on a cluster of commodity machines using datasets and queries from the Third Provenance Challenge and Lehigh University Benchmark. Our study reveals interesting patterns in query evaluation, shows that our algorithms are promising, and suggests that cloud computing has a great potential for scalable Semantic Web data management
- …