    Constraints for Semistructured Data and XML

    Integrity constraints play a fundamental role in database design. We review initial work on the expression of integrity constraints for semistructured data and XML

    On Incomplete XML Documents with Integrity Constraints

    Abstract. We consider incomplete specifications of XML documents in the presence of schema information and integrity constraints. We show that integrity constraints such as keys and foreign keys affect consistency of such specifications. We prove that the consistency problem for incomplete specifications with keys and foreign keys can always be solved in NP. We then show a dichotomy result, classifying the complexity of the problem as NP-complete or PTIME, depending on the precise set of features used in incomplete descriptions.

    A hybrid logic for XML reference constraints

    XML emerged as the (meta) mark-up language for representing, exchanging, and storing semistructured data. The structure of an XML document may be specified either through DTD (Document Type Definition) language or through the specific language XML Schema. While the expressiveness of XML Schema allows one to specify both the structure and constraints for XML documents, DTD does not allow the specification of integrity constraints for XML documents. On the other side, DTD has a very compact notation opposed to the complex notation and syntax of XML Schema. Thus, it becomes important to consider the issue of how to express further constraints on DTD-based XML documents, still retaining the simplicity and succinctness of DTDs. According to this scenario, in this paper we focus on a (as much as possible) simple logic, named XHyb, expressive enough to allow the specification of the most common integrity and reference constraints in XML documents. In particular, we focus on constraints on ID and IDREF(S) attributes, which are the common way of logically connecting parts of XML documents, besides the usual parent-child relationship of XML elements. Differently from other previously proposed hybrid logics, in XHyb IDREF(S) attributes are explicitly expressible by means of suitable syntactical constructors. Moreover, we propose a refinement of the usual graph representation of XML documents in order to represent XML documents in a formal and intuitive way without flatten accessibility through IDREF(S) to the usual parent-child relationship. Model checking algorithms are then proposed, to verify that a given XML document satisfies the considered constraints

    A Conceptual Schema Based XML Schema with Integrity Constraints Checking

    The more popular XML for exchanging and representing information on Web, the more important Flat XML (XML) and intelligent editors become. For data exchanging, an XML Data with an XML Schema and integrity constraints are preferred. We employ an Object-Role Modeling (ORM) for enriching the XML Schema constraints and providing better validation the XML Data. An XML conceptual schema is presented using the ORM conceptual model. Editor Meta Tables are generated from the conceptual schema diagram and are populated. A User XML Schema base on the information in the Editor Meta Tables is generated. However, W3C XML Schema language does not support all of the ORM constraints. Therefore, we propose an Editor XML Schema and an Editor XML Data to cover unsupported the ORM constraints. We propose the algorithms for defining constraint in the User XML Schema and extending validity constraint checking. Finally, XQuery is used for extending validity checking

    Global Semantic Integrity Constraint Checking for a System of Databases

    In today’s emerging information systems, it is natural to have data distributed across multiple sites. We define a System of Databases (SyDb) as a collection of autonomous and heterogeneous databases. R-SyDb (System of Relational Databases) is a restricted form of SyDb, referring to a collection of relational databases, which are independent. Similarly, X-SyDb (System of XML Databases) refers to a collection of XML databases. Global integrity constraints ensure integrity and consistency of data spanning multiple databases. In this dissertation, we present (i) Constraint Checker, a general framework of a mobile agent based approach for checking global constraints on R-SyDb, and (ii) XConstraint Checker, a general framework for checking global XML constraints on X-SyDb. Furthermore, we formalize multiple efficient algorithms for varying semantic integrity constraints involving both arithmetic and aggregate predicates. The algorithms take as input an update statement, list of all global semantic integrity constraints with arithmetic predicates or aggregate predicates and outputs sub-constraints to be executed on remote sites. The algorithms are efficient since (i) constraint check is carried out at compile time, i.e. before executing update statement; hence we save time and resources by avoiding rollbacks, and (ii) the implementation exploits parallelism. We have also implemented a prototype of systems and algorithms for both R-SyDb and X-SyDb. We also present performance evaluations of the system

    Functional dependencies for XML : axiomatisation and normal form in the presence of frequencies and identifiers : a thesis presented in partial fulfilment of the requirements for the degree of Master of Sciences in Information Sciences at Massey University, Palmerston North, New Zealand

    XML has gained popularity as a markup language for publishing and exchanging data on the web. Nowadays, there are also ongoing interests in using XML for representing and actually storing data. In particular, much effort has been directed towards turning XML into a real data model by improving the semantics that can be expressed about XML documents. Various works have addressed how to define different classes of integrity constraints and the development of a normalisation theory for XML. One area which received little to no attention from the research community up to five years ago is the study of functional dependencies in the context of XML [37]. Since then, there has been increasingly more research investigating functional dependencies in XML. Nevertheless, a comprehensive dependency theory and normalisation theory for XML have yet to emerge. Functional dependencies are an integral part of database theory in the relational data model (RDM). In particular, functional dependencies have been vital in the investigation of how to design "good" relational database schemas which avoid or minimise problems relating to data redundancy and data inconsistency. Since the same problems can be shown to exist in poorly designed XML schemas 1 , there is a need to investigate how these problems can be eliminated in the context of XML. We believe that the study of an analogy to relational functional dependencies in the context of XML is equally significant towards designing "good" XML schemas. [FROM INTRODUCTION

    Reasoning About Integrity Constraints for Tree-Structured Data

    We study a class of integrity constraints for tree-structured data modelled as data trees, whose nodes have a label from a finite alphabet and store a data value from an infinite data domain. The constraints require each tuple of nodes selected by a conjunctive query (using navigational axes and labels) to satisfy a positive combination of equalities and a positive combination of inequalities over the stored data values. Such constraints are instances of the general framework of XML-to-relational constraints proposed recently by Niewerth and Schwentick. They cover some common classes of constraints, including W3C XML Schema key and unique constraints, as well as domain restrictions and denial constraints, but cannot express inclusion constraints, such as reference keys. Our main result is that consistency of such integrity constraints with respect to a given schema (modelled as a tree automaton) is decidable. An easy extension gives decidability for the entailment problem. Equivalently, we show that validity and containment of unions of conjunctive queries using navigational axes, labels, data equalities and inequalities is decidable, as long as none of the conjunctive queries uses both equalities and inequalities; without this restriction, both problems are known to be undecidable. In the context of XML data exchange, our result can be used to establish decidability for a consistency problem for XML schema mappings. All the decision procedures are doubly exponential, with matching lower bounds. The complexity may be lowered to singly exponential, when conjunctive queries are replaced by tree patterns, and the number of data comparisons is bounded

    A Logical Framework for XML Reference Specification

    XML emerged as the (meta) mark-up language for representing, exchanging, or storing semistructured data. The structure of an XML document may be specified through DTD (Document Type Definition) language or through the specific XML language XSchema. While the expressiveness of XML Schema allows one to specify both the structure and constraints for XML documents, DTD does not allow the specification of integrity constraints for XML documents. On the other side, DTD has a very compact notation opposed to the complex notation and syntax of XML Schema. According to this scenario, in this paper we focus on a (as much as possible) simple logic, called XHyb, expressive enough to allow the specification of the most common integrity constraints in XML documents. In particular we will deal with constraints on ID and IDREF(S) attributes, which are the common way of logically connecting parts of XML documents, besides the usual containment relation of XML elements

    Efficient Detection of XML Integrity Constraints

    Název práce: Efektívna detekcia integritných obmedzení v XML Autor: Michal Švirec Katedra: Katedra softwarového inženýrství Vedoucí diplomové práce: RNDr. Irena Mlýnková, Ph.D. Abstrakt: Znalosť integritných obmedzení v XML dátach je jeden z dôležitých aspektov ich spracovania. Avšak aj keď tieto integritné obmedzenia pre dané dáta poznáme, je častým javom, že dané dáta sú voči ním nekonzistentné. Z tohto dôvodu vznikla snaha detekovať tieto nekonzistentosti dát a následne ich opravovať. Táto práca rozširuje a zdokonaľuje doterajšie prístupy opráv XML dokumentov porušujúcich definované integritné obmedzenia, konkrétne takzvané funkčné závislosti. Práca prináša algoritmus začleňujúci váhový model a taktiež zapája užívateľa do procesu hľadania a následného aplikovania vhodnej opravy nekonzistentných XML dokumentov. Súčasťou práce sú experimentálne výsledky. Klíčová slova: XML, funkčná závislosť, porušenie funkčných závislostí, oprava porušeníTitle: Efficient Detection of XML Integrity Constraints Author: Michal Švirec Department: Department of Software Engineering Supervisor: RNDr. Irena Mlýnková, Ph.D. Abstract: Knowledge of integrity constraints covered in XML data is an impor- tant aspect of efficient data processing. However, although integrity constraints are defined for the given data, it is a common phenomenon that data violate the predefined set of constraints. Therefore detection of these inconsistencies and consecutive repair has emerged. This work extends and refines recent approaches to repairing XML documents violating defined set of integrity constraints, specif- ically so-called functional dependencies. The work proposes the repair algorithm incorporating the weight model and also involve a user into the process of de- tection and subsequent application of appropriate repair of inconsistent XML documents. Experimental results are part of the work. Keywords: XML, functional dependency, functional dependencies violations, vi- olations repairDepartment of Software EngineeringKatedra softwarového inženýrstvíFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult