12,267 research outputs found

    Secure Querying of Recursive XML Views: A Standard XPath-based Technique

    Get PDF
    Most state-of-the art approaches for securing XML documents allow users to access data only through authorized views defined by annotating an XML grammar (e.g. DTD) with a collection of XPath expressions. To prevent improper disclosure of confidential information, user queries posed on these views need to be rewritten into equivalent queries on the underlying documents. This rewriting enables us to avoid the overhead of view materialization and maintenance. A major concern here is that query rewriting for recursive XML views is still an open problem. To overcome this problem, some works have been proposed to translate XPath queries into non-standard ones, called Regular XPath queries. However, query rewriting under Regular XPath can be of exponential size as it relies on automaton model. Most importantly, Regular XPath remains a theoretical achievement. Indeed, it is not commonly used in practice as translation and evaluation tools are not available. In this paper, we show that query rewriting is always possible for recursive XML views using only the expressive power of the standard XPath. We investigate the extension of the downward class of XPath, composed only by child and descendant axes, with some axes and operators and we propose a general approach to rewrite queries under recursive XML views. Unlike Regular XPath-based works, we provide a rewriting algorithm which processes the query only over the annotated DTD grammar and which can run in linear time in the size of the query. An experimental evaluation demonstrates that our algorithm is efficient and scales well.Comment: (2011

    Optimizing Queries Using a Materialized View in a Data Warehoue

    Get PDF
    A data warehouse is a user-centered environment for data analysis and decision support. To support decision maker in making decisions quickly and accurately, using materialized views can provide significant improvements in query processing time. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations. The known algorithms, the bucket algorithm, the inverse-rules algorithm have been used to rewrite queries using views before executing the queries. The bucket algorithm, predominantly used to rewrite queries, generates a candidate rewriting to a query using views then checks that the rewriting is contained in the original query. However, we show same deficiencies in the bucket algorithm then describe the containment bucket algorithm and give an optimal method to solve this problem. We present an experiment comparing the performance of both algorithms.Computer Science Departmen

    Algebraic rewritings for optimizing regular path queries

    Get PDF
    AbstractRewriting queries using views is a powerful technique that has applications in query optimization, data integration, data warehousing, etc. Query rewriting in relational databases is by now rather well investigated. However, in the framework of semistructured data the problem of rewriting has received much less attention. In this paper we focus on extracting as much information as possible from algebraic rewritings for the purpose of optimizing regular path queries. The cases when we can find a complete exact rewriting of a query using a set a views are very “ideal”. However, there is always information available in the views, even if this information is only partial. We introduce “lower” and “possibility” partial rewritings and provide algorithms for computing them. These rewritings are algebraic in their nature, i.e. we use only the algebraic view definitions for computing the rewritings. We do not use any pairs (tuples) of objects for computing the rewritings. This fact makes them a main memory product, which can be used for reducing secondary memory and remote access. After the main memory algebraic computation of the rewritings there is a second phase, with secondary memory access, for deriving the pairs of objects in the query answer. We give two algorithms for utilizing the partial lower and partial possibility rewritings to decrease the number of secondary memory accesses

    Efficient and scalable techniques for minimization and rewriting of conjunctive queries

    Get PDF
    Query rewriting as an approach to query answering has been a challenging issue in database and information integration systems. In general, rewriting of a conjunctive query Q using a set of views in conjunctive form consists of two phases: (1) generating proper building blocks using the views, and (2) combining them to generate a union of conjunctive queries which is maximally contained in Q . While the problem of query rewriting is known to be exponential in the number of subgoals of Q , there is a demand for increased efficiency for practical queries. We revisit this problem for conjunctive queries, and show that Stirling numbers can be used to determine the optimal number of combinations in the second phase, and hence the number of rules in the generated union of conjunctive queries. Based on these numbers, we introduce the notion of combination patterns and develop a rewriting algorithm that uses these numeral patterns to break down the large combinatorial problem in the second phase into several smaller ones. The results of our numerous experiments indicate that the proposed rewriting technique outperforms existing techniques including Minicon algorithm in terms of computation time, memory requirements, and scalability. On a related context, we studied query minimization, motivated by the fact that queries with fewer or no redundant subgoals can be evaluated faster, in general. However, such redundancies are often present in automatically generated queries. We propose an algorithm that, given a conjunctive query, repeatedly identifies and eliminates all the redundant subgoals. We also illustrate its performance superiority over existing minimization algorithms. It has been shown that query rewriting naturally generates queries with redundant subgoals. We also show that redundant subgoals in the input of query rewriting result in redundant rules in its output. Based on this, we investigate the impact of minimization as pre-processing and post-processing phases to query rewriting technique. Our experimental results using different synthetic data show that our query rewriting technique coupled with pre/post minimization phases produces the best quality of rewriting in a more efficient way compared to existing rewriting techniques, including the Treewise algorithm. It has been shown that extending conjunctive queries with constraints adds to the complexity of query rewriting. Previous studies identified classes of conjunctive queries with constraints in the form of arithmetic comparisons for which the complexity of rewriting does not change. Such classes are said to satisfy homomorphism property. We identify new classes of conjunctive queries with linear arithmetic constraints that enjoy this property, and extend our query rewriting algorithm accordingly to support such queries

    Answering Queries using Views over Probabilistic XML: Complexity and Tractability

    Full text link
    We study the complexity of query answering using views in a probabilistic XML setting, identifying large classes of XPath queries -- with child and descendant navigation and predicates -- for which there are efficient (PTime) algorithms. We consider this problem under the two possible semantics for XML query results: with persistent node identifiers and in their absence. Accordingly, we consider rewritings that can exploit a single view, by means of compensation, and rewritings that can use multiple views, by means of intersection. Since in a probabilistic setting queries return answers with probabilities, the problem of rewriting goes beyond the classic one of retrieving XML answers from views. For both semantics of XML queries, we show that, even when XML answers can be retrieved from views, their probabilities may not be computable. For rewritings that use only compensation, we describe a PTime decision procedure, based on easily verifiable criteria that distinguish between the feasible cases -- when probabilistic XML results are computable -- and the unfeasible ones. For rewritings that can use multiple views, with compensation and intersection, we identify the most permissive conditions that make probabilistic rewriting feasible, and we describe an algorithm that is sound in general, and becomes complete under fairly permissive restrictions, running in PTime modulo worst-case exponential time equivalence tests. This is the best we can hope for since intersection makes query equivalence intractable already over deterministic data. Our algorithm runs in PTime whenever deterministic rewritings can be found in PTime.Comment: VLDB201

    A top-down approach to answering queries using views

    Get PDF
    The problem of answering queries using views is concerned with finding answers to a query using only answers to a set of views. In the context of data integration with LAV approach, this problem translates to finding maximally contained rewriting for a query using a set of views. When both query and views are in conjunctive form, rewritings generated by existing bottom-up algorithms in this context are generally expensive to evaluate. As a result, they often require costly post-processing to improve efficiency of computing the answer tuples. In this dissertation, we propose a top-down approach to the rewriting problem of conjunctive queries. We first present a graph-based analysis of the problem and identify conditions that must be satisfied to ensure maximal containment of rewriting. We then present TreeWise, a novel algorithm that uses our top-down approach to efficiently generate maximally contained rewritings that are generally less expensive to evaluate. Our experiments confirm that TreeWise generally produces better quality rewritings, with a performance comparable to the most efficient of previously proposed algorithm

    Datalog Rewritings of Regular Path Queries using Views

    Get PDF
    We consider query answering using views on graph databases, i.e. databases structured as edge-labeled graphs. We mainly consider views and queries specified by Regular Path Queries (RPQ). These are queries selecting pairs of nodes in a graph database that are connected via a path whose sequence of edge labels belongs to some regular language. We say that a view V determines a query Q if for all graph databases D, the view image V(D) always contains enough information to answer Q on D. In other words, there is a well defined function from V(D) to Q(D). Our main result shows that when this function is monotone, there exists a rewriting of Q as a Datalog query over the view instance V(D). In particular the rewriting query can be evaluated in time polynomial in the size of V(D). Moreover this implies that it is decidable whether an RPQ query can be rewritten in Datalog using RPQ views
    • 

    corecore