1,502 research outputs found
EAGLE—A Scalable Query Processing Engine for Linked Sensor Data
Recently, many approaches have been proposed to manage sensor data using semantic web technologies for effective heterogeneous data integration. However, our empirical observations revealed that these solutions primarily focused on semantic relationships and unfortunately paid less attention to spatio–temporal correlations. Most semantic approaches do not have spatio–temporal support. Some of them have attempted to provide full spatio–temporal support, but have poor performance for complex spatio–temporal aggregate queries. In addition, while the volume of sensor data is rapidly growing, the challenge of querying and managing the massive volumes of data generated by sensing devices still remains unsolved. In this article, we introduce EAGLE, a spatio–temporal query engine for querying sensor data based on the linked data model. The ultimate goal of EAGLE is to provide an elastic and scalable system which allows fast searching and analysis with respect to the relationships of space, time and semantics in sensor data. We also extend SPARQL with a set of new query operators in order to support spatio–temporal computing in the linked sensor data context.EC/H2020/732679/EU/ACTivating InnoVative IoT smart living environments for AGEing well/ACTIVAGEEC/H2020/661180/EU/A Scalable and Elastic Platform for Near-Realtime Analytics for The Graph of Everything/SMARTE
Compressed k2-Triples for Full-In-Memory RDF Engines
Current "data deluge" has flooded the Web of Data with very large RDF
datasets. They are hosted and queried through SPARQL endpoints which act as
nodes of a semantic net built on the principles of the Linked Data project.
Although this is a realistic philosophy for global data publishing, its query
performance is diminished when the RDF engines (behind the endpoints) manage
these huge datasets. Their indexes cannot be fully loaded in main memory, hence
these systems need to perform slow disk accesses to solve SPARQL queries. This
paper addresses this problem by a compact indexed RDF structure (called
k2-triples) applying compact k2-tree structures to the well-known
vertical-partitioning technique. It obtains an ultra-compressed representation
of large RDF graphs and allows SPARQL queries to be full-in-memory performed
without decompression. We show that k2-triples clearly outperforms
state-of-the-art compressibility and traditional vertical-partitioning query
resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
- …