41 research outputs found
The Odyssey Approach for Optimizing Federated SPARQL Queries
Answering queries over a federation of SPARQL endpoints requires combining
data from more than one data source. Optimizing queries in such scenarios is
particularly challenging not only because of (i) the large variety of possible
query execution plans that correctly answer the query but also because (ii)
there is only limited access to statistics about schema and instance data of
remote sources. To overcome these challenges, most federated query engines rely
on heuristics to reduce the space of possible query execution plans or on
dynamic programming strategies to produce optimal plans. Nevertheless, these
plans may still exhibit a high number of intermediate results or high execution
times because of heuristics and inaccurate cost estimations. In this paper, we
present Odyssey, an approach that uses statistics that allow for a more
accurate cost estimation for federated queries and therefore enables Odyssey to
produce better query execution plans. Our experimental results show that
Odyssey produces query execution plans that are better in terms of data
transfer and execution time than state-of-the-art optimizers. Our experiments
using the FedBench benchmark show execution time gains of at least 25 times on
average.Comment: 16 pages, 10 figure
Context Aware Source Selection for Linked Data
The traditional Web is evolving into the Web of Data, which gathers huge collections of structured data over distributed, heterogeneous data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection allows the identification of the sources that most likely contain relevant content. Due to the semantic heterogeneity of the Web of Data, however, it is not always easy to assess relevancy. Context information might help in interpreting the user\u2019s information needs. In this paper, we discuss how context information can be exploited to improve source selection
How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?
The current de-facto way to query the Web of Data is through the SPARQL
protocol, where a client sends queries to a server through a SPARQL endpoint.
Contrary to an HTTP server, providing and maintaining a robust and reliable
endpoint requires a significant effort that not all publishers are willing or
able to make. An alternative query evaluation method is through link traversal,
where a query is answered by dereferencing online web resources (URIs) at real
time. While several approaches for such a lookup-based query evaluation method
have been proposed, there exists no analysis of the types (patterns) of queries
that can be directly answered on the live Web, without accessing local or
remote endpoints and without a-priori knowledge of available data sources. In
this paper, we first provide a method for checking if a SPARQL query (to be
evaluated on a SPARQL endpoint) can be answered through zero-knowledge link
traversal (without accessing the endpoint), and analyse a large corpus of real
SPARQL query logs for finding the frequency and distribution of answerable and
non-answerable query patterns. Subsequently, we provide an algorithm for
transforming answerable queries to SPARQL-LD queries that bypass the endpoints.
We report experimental results about the efficiency of the transformed queries
and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
Approximate Semantic Matching Over Linked Data Streams
In the Internet of Things (IoT),data can be generated by all kinds of smart things. In such context, enabling machines to process and understand such data is critical. Semantic Web technologies, such as Linked Data, provide an effective and machine-understandable way to represent IoT data for further processing. It is a challenging issue to match Linked Data streams semantically based on text similarity as text similarity computation is time consuming. In this paper, we present a hashing-based approximate approach to efficiently match Linked Data streams with users’ needs. We use the Resource Description Framework (RDF) to represent IoT data and adopt triple patterns as user queries to describe users’ data needs. We then apply locality-sensitive hashing techniques to transform semantic data into numerical values to support efficient matching between data and user queries. We design a modified k nearest neighbors (kNN) algorithm to speedup the matching process. The experimental results show that our approach is up to five times faster than the traditional methods and can achieve high precisions and recalls