7,151 research outputs found
Processing SPARQL queries with regular expressions in RDF databases
Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph.
Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique.
Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu
Optimizing Federated Queries Based on the Physical Design of a Data Lake
The optimization of query execution plans is known to be crucial
for reducing the query execution time. In particular, query optimization
has been studied thoroughly for relational databases
over the past decades. Recently, the Resource Description Framework
(RDF) became popular for publishing data on the Web. As
a consequence, federations composed of different data models
like RDF and relational databases evolved. One type of these
federations are Semantic Data Lakes where every data source is
kept in its original data model and semantically annotated with
ontologies or controlled vocabularies. However, state-of-the-art
query engines for federated query processing over Semantic Data
Lakes often rely on optimization techniques tailored for RDF. In
this paper, we present query optimization techniques guided
by heuristics that take the physical design of a Data Lake into
account. The heuristics are implemented on top of Ontario, a
SPARQL query engine for Semantic Data Lakes. Using sourcespecific
heuristics, the query engine is able to generate more efficient
query execution plans by exploiting the knowledge about
indexes and normalization in relational databases. We show that
heuristics which take the physical design of the Data Lake into
account are able to speed up query processing
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
On Defining SPARQL with Boolean Tensor Algebra
The Resource Description Framework (RDF) represents information as
subject-predicate-object triples. These triples are commonly interpreted as a
directed labelled graph. We propose an alternative approach, interpreting the
data as a 3-way Boolean tensor. We show how SPARQL queries - the standard
queries for RDF - can be expressed as elementary operations in Boolean algebra,
giving us a complete re-interpretation of RDF and SPARQL. We show how the
Boolean tensor interpretation allows for new optimizations and analyses of the
complexity of SPARQL queries. For example, estimating the size of the results
for different join queries becomes much simpler
View Selection in Semantic Web Databases
We consider the setting of a Semantic Web database, containing both explicit
data encoded in RDF triples, and implicit data, implied by the RDF semantics.
Based on a query workload, we address the problem of selecting a set of views
to be materialized in the database, minimizing a combination of query
processing, view storage, and view maintenance costs. Starting from an existing
relational view selection method, we devise new algorithms for recommending
view sets, and show that they scale significantly beyond the existing
relational ones when adapted to the RDF context. To account for implicit
triples in query answers, we propose a novel RDF query reformulation algorithm
and an innovative way of incorporating it into view selection in order to avoid
a combinatorial explosion in the complexity of the selection process. The
interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201
- …