95 research outputs found

    Independence of Containing Patterns Property and its Application in Tree Pattern Query Rewriting Using Views

    Get PDF
    Abstract We show that several classes of tree patterns observe the independence of containing patterns property, that is, if a pattern is contained in the union of several patterns, then it is contained in one of them. We apply this property to two related problems on tree pattern rewriting using views. First, given view V and query Q, is it possible for Q to have an equivalent rewriting using V which is the union of two or more tree patterns, but not an equivalent rewriting which is a single pattern? This problem is of both theoretical and practical importance because, if the answer is no, then, to find an equivalent rewriting of a tree pattern using a view, we should use more efficient methods, such as the polynomial time algorithm o

    From Relations to XML: Cleaning, Integrating and Securing Data

    Get PDF
    While relational databases are still the preferred approach for storing data, XML is emerging as the primary standard for representing and exchanging data. Consequently, it has been increasingly important to provide a uniform XML interface to various data sourcesā€” integration; and critical to protect sensitive and confidential information in XML data ā€” access control. Moreover, it is preferable to first detect and repair the inconsistencies in the data to avoid the propagation of errors to other data processing steps. In response to these challenges, this thesis presents an integrated framework for cleaning, integrating and securing data. The framework contains three parts. First, the data cleaning sub-framework makes use of a new class of constraints specially designed for improving data quality, referred to as conditional functional dependencies (CFDs), to detect and remove inconsistencies in relational data. Both batch and incremental techniques are developed for detecting CFD violations by SQL efficiently and repairing them based on a cost model. The cleaned relational data, together with other non-XML data, is then converted to XML format by using widely deployed XML publishing facilities. Second, the data integration sub-framework uses a novel formalism, XML integration grammars (XIGs), to integrate multi-source XML data which is either native or published from traditional databases. XIGs automatically support conformance to a target DTD, and allow one to build a large, complex integration via composition of component XIGs. To efficiently materialize the integrated data, algorithms are developed for merging XML queries in XIGs and for scheduling them. Third, to protect sensitive information in the integrated XML data, the data security sub-framework allows users to access the data only through authorized views. User queries posed on these views need to be rewritten into equivalent queries on the underlying document to avoid the prohibitive cost of materializing and maintaining large number of views. Two algorithms are proposed to support virtual XML views: a rewriting algorithm that characterizes the rewritten queries as a new form of automata and an evaluation algorithm to execute the automata-represented queries. They allow the security sub-framework to answer queries on views in linear time. Using both relational and XML technologies, this framework provides a uniform approach to clean, integrate and secure data. The algorithms and techniques in the framework have been implemented and the experimental study verifies their effectiveness and efficiency

    Logics for Unranked Trees: An Overview

    Get PDF
    Labeled unranked trees are used as a model of XML documents, and logical languages for them have been studied actively over the past several years. Such logics have different purposes: some are better suited for extracting data, some for expressing navigational properties, and some make it easy to relate complex properties of trees to the existence of tree automata for those properties. Furthermore, logics differ significantly in their model-checking properties, their automata models, and their behavior on ordered and unordered trees. In this paper we present a survey of logics for unranked trees

    XML Security Views Revisited

    Get PDF
    International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to dene the view. First, we show that the full class of Regular XPath queries is closed under query rewriting. Next, we address the problem of constructing a DTD that describes the view schema, which in general needs not be regular. We propose three dierent methods of ap- proximating the view schema and we show that the produced DTDs are indistinguishable from the exact schema (with queries from a class speci c for each method). Finally, we investigate problems of static analysis of security access specications

    XML access control using static analysis

    Get PDF

    DescribeX: A Framework for Exploring and Querying XML Web Collections

    Full text link
    This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogeneous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX's light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page

    Information preserving XML schema embedding

    Get PDF
    A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the first systematic and effective approach to finding information preserving XML mappings.

    Reasoning about XML with temporal logics and automata

    Get PDF
    We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-BĆ¼chi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views

    Static analysis of XML security views and query rewriting

    Get PDF
    International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to define the view. First, we study {\em query rewriting} with views when the classes used to define queries and views are Regular XPath and MSO. Next, we investigate problems of {\em static analysis} of security access specifications (SAS): we introduce the novel class of \emph{interval-bounded} SAS and we define three different manners to compare views (i.e. queries), with a security point of view. We provide a systematic study of the complexity for deciding these three comparisons, when the depth of the XML documents is bounded, when the document may have an arbitrary depth but the queries defining the views are restricted to guarantee the interval-bounded property, and in the general setting without restriction on queries and document

    XQuery containment in presence of variable binding dependencies

    Full text link
    • ā€¦
    corecore