95 research outputs found
Independence of Containing Patterns Property and its Application in Tree Pattern Query Rewriting Using Views
Abstract We show that several classes of tree patterns observe the independence of containing patterns property, that is, if a pattern is contained in the union of several patterns, then it is contained in one of them. We apply this property to two related problems on tree pattern rewriting using views. First, given view V and query Q, is it possible for Q to have an equivalent rewriting using V which is the union of two or more tree patterns, but not an equivalent rewriting which is a single pattern? This problem is of both theoretical and practical importance because, if the answer is no, then, to find an equivalent rewriting of a tree pattern using a view, we should use more efficient methods, such as the polynomial time algorithm o
From Relations to XML: Cleaning, Integrating and Securing Data
While relational databases are still the preferred approach for storing data, XML is emerging
as the primary standard for representing and exchanging data. Consequently, it has
been increasingly important to provide a uniform XML interface to various data sourcesā
integration; and critical to protect sensitive and confidential information in XML data ā
access control. Moreover, it is preferable to first detect and repair the inconsistencies in
the data to avoid the propagation of errors to other data processing steps. In response to
these challenges, this thesis presents an integrated framework for cleaning, integrating and
securing data.
The framework contains three parts. First, the data cleaning sub-framework makes
use of a new class of constraints specially designed for improving data quality, referred
to as conditional functional dependencies (CFDs), to detect and remove inconsistencies in
relational data. Both batch and incremental techniques are developed for detecting CFD
violations by SQL efficiently and repairing them based on a cost model. The cleaned relational
data, together with other non-XML data, is then converted to XML format by using
widely deployed XML publishing facilities. Second, the data integration sub-framework
uses a novel formalism, XML integration grammars (XIGs), to integrate multi-source XML
data which is either native or published from traditional databases. XIGs automatically
support conformance to a target DTD, and allow one to build a large, complex integration
via composition of component XIGs. To efficiently materialize the integrated data, algorithms
are developed for merging XML queries in XIGs and for scheduling them. Third, to
protect sensitive information in the integrated XML data, the data security sub-framework
allows users to access the data only through authorized views. User queries posed on these
views need to be rewritten into equivalent queries on the underlying document to avoid the
prohibitive cost of materializing and maintaining large number of views. Two algorithms
are proposed to support virtual XML views: a rewriting algorithm that characterizes the
rewritten queries as a new form of automata and an evaluation algorithm to execute the
automata-represented queries. They allow the security sub-framework to answer queries
on views in linear time.
Using both relational and XML technologies, this framework provides a uniform approach
to clean, integrate and secure data. The algorithms and techniques in the framework
have been implemented and the experimental study verifies their effectiveness and efficiency
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
XML Security Views Revisited
International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to dene the view. First, we show that the full class of Regular XPath queries is closed under query rewriting. Next, we address the problem of constructing a DTD that describes the view schema, which in general needs not be regular. We propose three dierent methods of ap- proximating the view schema and we show that the produced DTDs are indistinguishable from the exact schema (with queries from a class speci c for each method). Finally, we investigate problems of static analysis of security access specications
DescribeX: A Framework for Exploring and Querying XML Web Collections
This thesis introduces DescribeX, a powerful framework that is capable of
describing arbitrarily complex XML summaries of web collections, providing
support for more efficient evaluation of XPath workloads. DescribeX permits the
declarative description of document structure using all axes and language
constructs in XPath, and generalizes many of the XML indexing and summarization
approaches in the literature. DescribeX supports the construction of
heterogeneous summaries where different document elements sharing a common
structure can be declaratively defined and refined by means of path regular
expressions on axes, or axis path regular expression (AxPREs). DescribeX can
significantly help in the understanding of both the structure of complex,
heterogeneous XML collections and the behaviour of XPath queries evaluated on
them.
Experimental results demonstrate the scalability of DescribeX summary
refinements and stabilizations (the key enablers for tailoring summaries) with
multi-gigabyte web collections. A comparative study suggests that using a
DescribeX summary created from a given workload can produce query evaluation
times orders of magnitude better than using existing summaries. DescribeX's
light-weight approach of combining summaries with a file-at-a-time XPath
processor can be a very competitive alternative, in terms of performance, to
conventional fully-fledged XML query engines that provide DB-like functionality
such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page
Information preserving XML schema embedding
A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the first systematic and effective approach to finding information preserving XML mappings.
Reasoning about XML with temporal logics and automata
We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-BĆ¼chi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views
Static analysis of XML security views and query rewriting
International audienceIn this paper, we revisit the view based security framework for XML without imposing any of the previously considered restrictions on the class of queries, the class of DTDs, and the type of annotations used to define the view. First, we study {\em query rewriting} with views when the classes used to define queries and views are Regular XPath and MSO. Next, we investigate problems of {\em static analysis} of security access specifications (SAS): we introduce the novel class of \emph{interval-bounded} SAS and we define three different manners to compare views (i.e. queries), with a security point of view. We provide a systematic study of the complexity for deciding these three comparisons, when the depth of the XML documents is bounded, when the document may have an arbitrary depth but the queries defining the views are restricted to guarantee the interval-bounded property, and in the general setting without restriction on queries and document
- ā¦