2 research outputs found
Query Answering and Containment for Regular Path Queries under Distortions
Abstract. We give a general framework for approximate query processing in semistructured databases. We focus on regular path queries, which are the integral part of most of the query languages for semistructured databases. To enable approximations, we allow the regular path queries to be distorted. The distortions are expressed in the system by using weighted regular expressions, which correspond to weighted regular transducers. After defining the notion of weighted approximate answers we show how to compute them in order of their proximity to the query. In the new approximate setting, query containment has to be redefined in order to take into account the quantitative proximity information in the query answers. For this, we define approximate containment, and its variants k-containment and reliable containment. Then, we give an optimal algorithm for deciding the k-containment. Regarding the reliable approximate containment, we show that it is polynomial time equivalent to the notorious limitedness problem in distance automata.
Flexible query processing of SPARQL queries
SPARQL is the predominant language for querying RDF data, which is the standard
model for representing web data and more specifically Linked Open Data (a
collection of heterogeneous connected data). Datasets in RDF form can be hard to
query by a user if she does not have a full knowledge of the structure of the dataset.
Moreover, many datasets in Linked Data are often extracted from actual web page
content which might lead to incomplete or inaccurate data.
We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously
introduced in the context of regular path queries. Using these operators we are able
to support
exible querying over the property path queries of SPARQL 1.1. We call
this new language SPARQLAR.
Using SPARQLAR users are able to query RDF data without fully knowing the
structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This
means that users can access complex and heterogeneous datasets without the need
to know precisely how the data is structured.
One of the open problems we address is how to combine the APPROX and
RELAX operators with a pragmatic language such as SPARQL. We also devise an
implementation of a system that evaluates SPARQLAR queries in order to study the
performance of the new language.
We begin by defining the semantics of SPARQLAR and the complexity of query
evaluation. We then present a query processing technique for evaluating SPARQLAR
queries based on a rewriting algorithm and prove its soundness and completeness.
During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1
queries that are evaluated against the dataset. Each such query will generate answers
with a cost that indicates their distance with respect to the exact form of the original
SPARQLAR query.
Our prototype implementation incorporates three optimisation techniques that
aim to enhance query execution performance: the first optimisation is a pre-computation
technique that caches the answers of parts of the queries generated by the rewriting
algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard
queries that it is known will not return any answer. The third optimisation technique
uses the query containment concept to discard queries whose answers would
be returned by another query at the same or lower cost.
We conclude by conducting a performance study of the system on three different
RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia