98 research outputs found

    XML data exchange under expressive mappings

    Get PDF
    Data Exchange is the problem of transforming data in one format (the source schema) into data in another format (the target schema). Its core component is a schema mapping, which is a high level specification of how such transformation should be done. Relational data exchange has been extensively studied, but exchanging XML data have been paid much less attention. The goal of this thesis is to develop a theory of XML data exchange with expressive schema mappings, extending a previous work using restricted mappings. Our mapping language is based on tree patterns that can use horizontal navigation and data comparison in addition to downward navigation. First we look at static analysis problems concerning a single mapping. More specif- ically, we consider consistency problems with different flavours. One such problem, for instance, asks if any tree has a solution under the given mapping. Then we turn to analyse the complexity of mapping themselves, i.e., recognising pairs of trees such that the one is mapped to the other. For both problems, we provide classifications based on sets of features used in the mappings. Second we investigate the composition of XML schema mappings. Generally it is hard, or rather simply impossible, to achieve closure under composition in XML settings unlike in relational settings. Nevertheless we identify a class of XML schema mappings that is closed under composition. Lastly we consider the problem of query answering. It is important to exchange data so that we can feasibly answer queries while it often leads to intractability. We identify the dividing line between tractable and intractable cases: answering queries with extended features is always intractable while tractability of answering simple queries can be retained in extended mappings

    Ensuring Query Compatibility with Evolving XML Schemas

    Get PDF
    During the life cycle of an XML application, both schemas and queries may change from one version to another. Schema evolutions may affect query results and potentially the validity of produced data. Nowadays, a challenge is to assess and accommodate the impact of theses changes in rapidly evolving XML applications. This article proposes a logical framework and tool for verifying forward/backward compatibility issues involving schemas and queries. First, it allows analyzing relations between schemas. Second, it allows XML designers to identify queries that must be reformulated in order to produce the expected results across successive schema versions. Third, it allows examining more precisely the impact of schema changes over queries, therefore facilitating their reformulation

    Reasoning About Integrity Constraints for Tree-Structured Data

    Get PDF
    We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded

    Xcerpt: A Rule-Based Query and Transformation Language for the Web

    Get PDF
    This thesis investigates querying the Web and the Semantic Web. It proposes a new rulebased query language called Xcerpt. Xcerpt differs from other query languages in that it uses patterns instead of paths for the selection of data, and in that it supports both rule chaining and recursion. Rule chaining serves for structuring large queries, as well as for designing complex query programs (e.g. involving queries to the Semantic Web), and for modelling inference rules. Query patterns may contain special constructs like partial subqueries, optional subqueries, or negated subqueries that account for the particularly flexible structure of data on the Web. Furthermore, this thesis introduces the syntax of the language Xcerpt, which is illustrated on a large collection of use cases both from the conventional Web and the Semantic Web. In addition, a declarative semantics in form of a Tarski-style model theory is described, and an algorithm is proposed that performs a backward chaining evaluation of Xcerpt programs. This algorithm has also been implemented (partly) in a prototypical runtime system. A salient aspect of this algorithm is the specification of a non-standard unification algorithm called simulation unification that supports the new query constructs described above. This unification is symmetric in the sense that variables in both terms can be bound. On the other hand it is in contrast to standard unification assymmetric in the sense that the unification determines that the one term is a subterm of the other term.Diese Arbeit untersucht das Anfragen des Webs und des Semantischen Webs. Sie stellt eine neue regel-basierte Anfragesprache namens Xcerpt vor. Xcerpt unterscheidet sich von anderen Anfragesprachen insofern, als dass es zur Selektion von Daten sog. Pattern (,,Muster'') verwendet und sowohl Regelschliessen als auch Rekursion unterstützt, was sowohl zur Strukturierung größerer Anfragen als auch zur Erstellung komplexer Anfrageprogramme, und zur Modellierung von Inferenzregeln dient. Anfrage-Pattern können spezielle Konstrukte, wie partielle Teilanfragen, optionale Teilanfragen, oder negierte Teilanfragen, enthalten, die der besonders flexiblen Struktur von Daten im Web genügen. In dieser Arbeit wird weiterhin die Syntax von Xcerpt eingeführt, und mit Hilfe mehrerer Anwendungsszenarien sowohl aus dem konventionellen als auch aus dem semantischen Web erläutert. Ausserdem wird eine deklarative Semantik im Stil von Tarski's Modelltheorie beschrieben und ein Algorithmus vorgeschlagen, der eine rückwärtsschliessende Auswertung von Xcerpt durchführt und in einem prototypischen Laufzeitsystem implementiert wurde. Wesentlicher Bestandteil des Rückwärtsschliessens ist die Spezifikation eines nicht-standard Unifikations-Algorithmus, der die oben genannten speziellen Xcerpt-Konstrukte berücksichtigt. Diese Unifikation ist symmetrisch in dem Sinne, dass sie Variablen in beiden angeglichenen (,,unifizierten'') Termen binden kann. Andererseits ist sie im Gegensatz zur Standardunifikation assymmetrisch in dem Sinne, dass der dadurch geleistete Angleich den einen Term als ,,Teilterm'' des anderen erkennt

    Dependencies revisited for improving data quality

    Get PDF

    Using XML views to improve data-independence of distributed applications that share data

    Get PDF
    The development and maintenance of distributed software applications that support and make efficient use of heterogeneous networked systems is very challenging. One aspect of the complexity is that these distributed applications often need to access shared data, and different applications sharing the data may have different needs and may access different parts of the data. Maintenance and modification are especially difficult when the underlying structure of the data is changed for new requirements. The eXtensible Markup Language, or XML, has emerged as the universal standard for exchanging and externalizing data. It is also widely used for information modeling in an environment consisting of heterogeneous information sources. CORBA is a distributed object technology allowing applications on heterogeneous platforms to communicate through commonly defined services providing a scalable infrastructure for today\u27s distributed systems. To improve data independence, we propose an approach based on XML standards and the notion of views to develop and modify distributed applications which access shared data. In our approach, we model the shared data using XML, and generate different XML views of the data for different applications according to the DTDs of the XML views and the application logic. When the underlying data structure changes, new views are generated systematically. We adopt CORBA as the distributed architecture in our approach. Our thesis is that: views to support data-independence of distributed computing applications can be generated systematically from application logic, CORBA IDL and XML DTD.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .L86. Source: Masters Abstracts International, Volume: 41-04, page: 1113. Adviser: Richard Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2002

    Bounded repairability for regular tree languages

    Get PDF
    We study the problem of bounded repairability of a given restriction tree language R into a target tree language T. More precisely, we say that R is bounded repairable w.r.t. T if there exists a bound on the number of standard tree editing operations necessary to apply to any tree in R in order to obtain a tree in T. We consider a number of possible specifications for tree languages: bottom-up tree automata (on curry encoding of unranked trees) that capture the class of XML Schemas and DTDs. We also consider a special case when the restriction language R is universal, i.e., contains all trees over a given alphabet. We give an effective characterization of bounded repairability between pairs of tree languages represented with automata. This characterization introduces two tools, synopsis trees and a coverage relation between them, allowing one to reason about tree languages that undergo a bounded number of editing operations. We then employ this characterization to provide upper bounds to the complexity of deciding bounded repairability and we show that these bounds are tight. In particular, when the input tree languages are specified with arbitrary bottom-up automata, the problem is coNEXPTIME-complete. The problem remains coNEXPTIME-complete even if we use deterministic non-recursive DTDs to specify the input languages. The complexity of the problem can be reduced if we assume that the alphabet, the set of node labels, is fixed: the problem becomes PSPACE-complete for non-recursive DTDs and coNP-complete for deterministic non-recursive DTDs. Finally, when the restriction tree language R is universal, we show that the bounded repairability problem becomes EXPTIME-complete if the target language is specified by an arbitrary bottom-up tree automaton and becomes tractable (PTIME-complete, in fact) when a deterministic bottom-up automaton is used

    Regular Rooted Graph Grammars

    Get PDF
    In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei Einschränkungen aus Entscheidbarkeits- oder Effizienzgründen auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nötig, durch Vergröberungen bei der Typprüfung erkauft. Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingeführt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf reguläre Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML üblichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF üblich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation Regulärer Ausdrücke---ist neu. Eine operationale Semantik für geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische Ausdrücke) basieren. Es wird ferner statische Typ-Prüfung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingeführt.This thesis investigates a pragmatic approach to typing, static analysis and static optimization of Web query languages, in special the Web query language Xcerpt. The approach is pragmatic in the sense, that no restriction on the types are made for decidability or efficiency reasons, instead precision is given up if necessary. Pragmatics on the dynamic side means to use types not only to ensure validity of objects operating on, but also influencing query selection based on types. A typing language for typing of graph structured data on the Web is introduced. The Graphs in mind are based on spanning trees with references, the typing languages is based on regular tree grammars with typed reference extensions. Beside ordered data in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or RDF) can be modelled using regular expressions under unordered interpretation – this approach is new. An operational semantics for ordered and unordered types is given based on specialized regular tree automata and counting constraints (them again based on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct terms is introduced, as well as optimization of Xcerpt query terms based on schema information

    Processing Structured Hypermedia : A Matter of Style

    Get PDF
    With the introduction of the World Wide Web in the early nineties, hypermedia has become the uniform interface to the wide variety of information sources available over the Internet. The full potential of the Web, however, can only be realized by building on the strengths of its underlying research fields. This book describes the areas of hypertext, multimedia, electronic publishing and the World Wide Web and points out fundamental similarities and differences in approaches towards the processing of information. It gives an overview of the dominant models and tools developed in these fields and describes the key interrelationships and mutual incompatibilities. In addition to a formal specification of a selection of these models, the book discusses the impact of the models described on the software architectures that have been developed for processing hypermedia documents. Two example hypermedia architectures are described in more detail: the DejaVu object-oriented hypermedia framework, developed at the VU, and CWI's Berlage environment for time-based hypermedia document transformations
    • …
    corecore