6 research outputs found

    Algebraic rewritings for optimizing regular path queries

    Get PDF
    AbstractRewriting queries using views is a powerful technique that has applications in query optimization, data integration, data warehousing, etc. Query rewriting in relational databases is by now rather well investigated. However, in the framework of semistructured data the problem of rewriting has received much less attention. In this paper we focus on extracting as much information as possible from algebraic rewritings for the purpose of optimizing regular path queries. The cases when we can find a complete exact rewriting of a query using a set a views are very “ideal”. However, there is always information available in the views, even if this information is only partial. We introduce “lower” and “possibility” partial rewritings and provide algorithms for computing them. These rewritings are algebraic in their nature, i.e. we use only the algebraic view definitions for computing the rewritings. We do not use any pairs (tuples) of objects for computing the rewritings. This fact makes them a main memory product, which can be used for reducing secondary memory and remote access. After the main memory algebraic computation of the rewritings there is a second phase, with secondary memory access, for deriving the pairs of objects in the query answer. We give two algorithms for utilizing the partial lower and partial possibility rewritings to decrease the number of secondary memory accesses

    Query containment and rewriting using views for regular path queries under constraints

    Get PDF
    ABSTRACT In this paper we consider general path constraints for semistructured databases. Our general constraints do not suffer from the limitations of the path constraints previously studied in the literature. We investigate the containment of regular path queries under general path constraints. We show that when the path constraints and queries are expressed by words, as opposed to languages, the containment problem becomes equivalent to the word rewrite problem for a corresponding semi-Thue system. Consequently, if the corresponding semi-Thue system has an undecidable word problem, the word query containment problem will be undecidable too. Also, we show that there are word constraints, where the corresponding semi-Thue system has a decidable word rewrite problem, but the general query containment under these word constraints is undecidable. In order to overcome this, we exhibit a large, practical class of word constraints with a decidable general query containment problem. Based on the query containment under constraints, we reason about constrained rewritings -using views-of regular path queries. We give a constructive characterization for computing optimal constrained rewritings using views

    Integrating and querying linked datasets through ontological rules

    Get PDF
    The Web of Linked Open Data has developed from a few datasets in 2007 into a large data space containing billions of RDF triples published and stored in hundreds of independent datasets, so as to form the so called Linked Open Data Cloud. This information cloud, ranging over a wide set of data domains, poses a challenge when it comes to reconciling heterogeneous schemas or vocabularies adopted by data publishers. Motivated by this challenge, in this thesis was address the problem of integrating and querying multiple heterogeneous Linked Data sets through ontological rules. Firstly, we propose a formalisation of the notion of a peer-to-peer Linked Data integration system, where the mappings between peers comprise schema-level mappings and equality constraints between different IRIs; we call this formalism an RDF Peer System(RPS). We show that the semantics of the mappings preserve tractability of answering Basic Graph Pattern (BGP) SPARQL queries against the data stored in the RDF sources and the set of constraints given by the RPS mappings. Then, we address the problem of SPARQL query rewriting under RPSs and we show that it is not possible to rewrite an input BGP SPARQL query into a SPARQL 1.0 query under general RPSs, as the RPS peer mappings are not first-order-rewritable rules; this is a major drawback of general RPSs since data materialisation is required to exploit their full semantics. With the adoption of the more recent standard SPARQL 1.1 and its property paths we are able to extend the expressivity of the target language beyond first-order by including regular expressions in the body of the target SPARQL queries, that is, by expressing conjunctive two-way regular path queries (C2RPQs). Following this idea, in the second part of the thesis we step away from the language of RPSs to conduct a study on C2RPQ-rewritability under a broader ontology language. We define [ELHI`inh] (harmless linear ELHI), an ontology language that generalises both the DL-Lite[R] and linear ELH description logics. We prove the rewritability of instance queries (queries with a single atom in their body) under [ELHI`inh] knowledge bases with C2RPQs as the target language, presenting a query rewriting algorithm that makes use of non-deterministic finite-state automata. Following from that, we propose a query rewriting algorithm for answering conjunctive queries under [ELHI`inh] knowledge bases, with C2RPQs as the target language. Since C2RPQs can be straightforwardly expressed in SPARQL 1.1 by means of property paths, we believe that our approach is directly applicable to real-world querying settings. Lastly, we undertake a complexity analysis for query answering under [ELHI`inh]. We analyse the computational cost of query answering in terms of both data complexity (where the ontology and the query are fixed and the data alone is a variable input)and combined complexity (where query, ontology and data all constitute the variable input). We show that answering instance queries under [ELHI`inh] is NLogSpace-complete for data complexity and in PTime for combined complexity; we also show that answering CQs under [ELHI`inh] is NLogSpace-complete for data complexity and NP-complete for combined complexity

    Rewriting of Regular Expressions and Regular Path Queries

    No full text
    this paper we address the problem of view-based query rewriting in the context of semi-structured data. We present a method for computing the rewriting of a regular expression E in terms of other regular expressions. The method computes the exact rewriting (the one that defines the same regular language as E) if it exists, or the rewriting that defines the maximal language contained in the one defined by E, otherwise. We present a complexity analysis of both the problem and the method, showing that the latter is essentially optimal. Finally, we illustrate how to exploit the method for view-based rewriting of regular path queries in semi-structured data. The complexity results established for the rewriting of regular expressions apply also to the case of regular path querie

    Rewriting of Regular Expressions and Regular Path Queries

    Get PDF
    Recent work on semi-structured data has revitalized the interest in path queries, i.e., queries that ask for all pairs of objects in the database that are connected by a path conforming to a certain specification, in particular to a regular expression. Also, in semi-structured data, as well as in data integration, data warehousing, and query optimization, the problem of view-based query rewriting is receiving much attention: Given a query and a collection of views, generate a new query which uses the views and provides the answer to the original one. In this paper we address the problem of view-based query rewriting in the context of semi-structured data. We present a method for computing the rewriting of a regular expression E in terms of other regular expressions. The method computes the exact rewriting (the one that defines the same regular language as E) if it exists, or the rewriting that defines the maximal language contained in the one defined by E, otherwise. We present a complexity analysis of both the problem and the method, showing that the latter is essentially optimal. Finally, we illustrate how to exploit the method for view-based rewriting of regular path queries in semi-structured data. The complexity results established for the rewriting of regular expressions apply also to the case of regular path queries
    corecore