24 research outputs found

    From XML to relational database.

    Get PDF
    by Yan, Men-Hin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 114-119).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Storing XML in Database Systems --- p.2Chapter 1.2 --- Outline of the Thesis --- p.4Chapter 2 --- Related Work --- p.5Chapter 2.1 --- Overview of XML --- p.5Chapter 2.1.1 --- Extensible Markup Language (XML) --- p.5Chapter 2.1.2 --- Data Type Definition (DTD) --- p.6Chapter 2.1.3 --- "ID, IDREF and IDREFS" --- p.9Chapter 2.2 --- Using Special-Purpose Database to Store XML Data --- p.10Chapter 2.3 --- Using Relational Databases to Store XML Data --- p.11Chapter 2.3.1 --- Extracting Schemas with STORED --- p.11Chapter 2.3.2 --- Using Simple Schemes Based on Labeled Graph --- p.12Chapter 2.3.3 --- Generating Schemas from DTDs --- p.12Chapter 2.3.4 --- Commercial Approaches --- p.13Chapter 2.4 --- Discovering Functional Dependencies --- p.14Chapter 2.4.1 --- Functional Dependency --- p.14Chapter 2.4.2 --- Finding Functional Dependencies --- p.14Chapter 2.4.3 --- TANE and Partition Refinement --- p.15Chapter 2.5 --- Multivalued Dependencies --- p.17Chapter 2.5.1 --- Example of Multivalued Dependency --- p.18Chapter 3 --- Using RDBMS to Store XML Data --- p.20Chapter 3.1 --- Global Schema Extraction Algorithm --- p.22Chapter 3.1.1 --- Step 1: Simplify DTD --- p.22Chapter 3.1.2 --- Step 2: Construct Schema Prototype Trees --- p.24Chapter 3.1.3 --- Step 3: Generate Relational Schema Prototype --- p.29Chapter 3.1.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.31Chapter 3.1.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.32Chapter 3.1.6 --- Discussion --- p.32Chapter 3.2 --- DTD-splitting Schema Extraction Algorithm --- p.34Chapter 3.2.1 --- Step 1: Simplify DTD --- p.35Chapter 3.2.2 --- Step 2: Construct Schema Prototype Trees --- p.36Chapter 3.2.3 --- Step 3: Generate Relational Schema Prototype --- p.45Chapter 3.2.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.46Chapter 3.2.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.47Chapter 3.2.6 --- Discussion --- p.49Chapter 3.3 --- Experimental Results --- p.50Chapter 3.3.1 --- Real Life XML Data: SIGMOD Record XML --- p.50Chapter 3.3.2 --- Synthetic XML Data --- p.58Chapter 3.3.3 --- Discussion --- p.68Chapter 4 --- Finding Multivalued Dependencies --- p.75Chapter 4.1 --- Validation of Multivalued Dependencies --- p.77Chapter 4.2 --- Search Strategy and Pruning --- p.80Chapter 4.2.1 --- Search Strategy for Left-hand Sides Candidates --- p.81Chapter 4.2.2 --- Search Strategy for Right-hand Sides Candidates --- p.82Chapter 4.2.3 --- Other Pruning --- p.85Chapter 4.3 --- Computing with Partitions --- p.87Chapter 4.3.1 --- Computing Partitions --- p.88Chapter 4.4 --- Algorithm --- p.89Chapter 4.4.1 --- Generating Next Level Candidates --- p.92Chapter 4.4.2 --- Computing Partitions --- p.93Chapter 4.5 --- Experimental Results --- p.94Chapter 4.5.1 --- Results of the Algorithm --- p.95Chapter 4.5.2 --- Evaluation on the Results --- p.96Chapter 4.5.3 --- Scalability of the Algorithm --- p.98Chapter 4.5.4 --- Using Multivalued Dependencies in Schema Extraction Al- gorithms --- p.101Chapter 5 --- Conclusion --- p.108Chapter 5.1 --- Discussion --- p.108Chapter 5.2 --- Future Work --- p.110Chapter 5.2.1 --- Translate Semistructured Queries to SQL --- p.110Chapter 5.2.2 --- Improve the Multivalued Dependency Discovery Algorithm --- p.112Chapter 5.2.3 --- Incremental Update of Resulting Schema --- p.113Bibliography --- p.113Appendix --- p.120Chapter A --- Simple Proof for Minimality in Multivalued Dependencies --- p.120Chapter B --- Third and Fourth Normal Form Decompositions --- p.122Chapter B.1 --- 3NF Decomposition Algorithm --- p.123Chapter B.2 --- 4NF Decomposition Algorithm --- p.12

    XML document design via GN-DTD

    Get PDF
    Designing a well-structured XML document is important for the sake of readability and maintainability. More importantly, this will avoid data redundancies and update anomalies when maintaining a large quantity of XML based documents. In this paper, we propose a method to improve XML structural design by adopting graphical notations for Document Type Definitions (GN-DTD), which is used to describe the structure of an XML document at the schema level. Multiples levels of normal forms for GN-DTD are proposed on the basis of conceptual model approaches and theories of normalization. The normalization rules are applied to transform a poorly designed XML document into a well-designed based on normalized GN-DTD, which is illustrated through examples

    Functional dependencies over XML documents with DTDs

    Get PDF
    In this article an axiomatisation for functional dependencies over XML documents is presented. The approach is based on a representation of XML document type definitions (or XML schemata) by nested attributes using constructors for records, disjoint unions and lists, and a particular null value, which covers optionality. Infinite structures that may result from referencing attributes in XML are captured by rational trees. Using a partial order on nested attributes we obtain non-distributive Brouwer algebras. The operations of the Brouwer algebra are exploited in the soundness and completeness proofs for derivation rules for functional dependencies

    Generating a Normalized Database Using Class Normalization

    Get PDF
    Relational databases are the most popular databases used by enterprise applications to store persistent data to this day. It gives a lot of flexibility and efficiency. A process called database normalization helps make sure that the database is free from redundancies and update anomalies. In a Database-First approach to software development, the database is designed first, and then an Object-Relational Mapping (ORM) tool is used to generate the programming classes (data layer) to interact with the database. Finally, the business logic code is written to interact with the data layer to persist the business data to the database. However, in modern application development, a process called Code-First approach evolved where the domain classes and the business logic that interacts with the domain classes are written first. Then an Object Relational Mapping (ORM) tool is used to generate the database from the domain classes. In this approach, since database design is not a concern, software programmers may ignore the process of database normalization altogether. To help software programmers in this process, this thesis takes the theory behind the five database normal forms (1NF - 5NF) and proposes Five Class Normal Forms (1CNF - 5CNF) that software programmers may use to normalize their domain classes. This thesis demonstrates that when the Five Class Normal Forms are applied manually to a class by a programmer, the resulting database that is generated from the Code-First approach is also normalized according to the rules of relational theory

    An information-theoretic analysis of worst-case redundancy in database design

    Get PDF

    Normalization Theory for XML

    Get PDF
    Abstract. Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good – some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon’s information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.

    XML documents schema design

    Get PDF
    The eXtensible Markup Language (XML) is fast emerging as the dominant standard for storing, describing and interchanging data among various systems and databases on the intemet. It offers schema such as Document Type Definition (DTD) or XML Schema Definition (XSD) for defining the syntax and structure of XML documents. To enable efficient usage of XML documents in any application in large scale electronic environment, it is necessary to avoid data redundancies and update anomalies. Redundancy and anomalies in XML documents can lead not only to higher data storage cost but also to increased costs for data transfer and data manipulation.To overcome this problem, this thesis proposes to establish a formal framework of XML document schema design. To achieve this aim, we propose a method to improve and simplify XML schema design by incorporating a conceptual model of the DTD with a theory of database normalization. A conceptual diagram, Graph-Document Type Definition (G-DTD) is proposed to describe the structure of XML documents at the schema level. For G- DTD itself, we define a structure which incorporates attributes, simple elements, complex elements, and relationship types among them. Furthermore, semantic constraints are also precisely defined in order to capture semantic meanings among the defined XML objects.In addition, to provide a guideline to a well-designed schema for XML documents, we propose a set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al. The corresponding normalization rules to transform from a G- DTD into a normal form schema are also discussed. A case study is given to illustrate the applicability of the concept. As a result, we found that the new normal forms are more concise and practical, in particular as they allow the user to find an 'optimal' structure of XML elements/attributes at the schema level. To prove that our approach is applicable for the database designer, we develop a prototype of XML document schema design using a Z formal specification language. Finally, using the same case study, this formal specification is tested to check for correctness and consistency of the specification. Thus, this gives a confidence that our prototype can be implemented successfully to generate an automatic XML schema design

    Weak functional dependencies on trees with restructuring

    Get PDF
    We present an axiomatisation for weak functional dependencies, i.e. disjunctions of functional dependencies, in the presence of several constructors for complex values. The investigated constructors capture records, sets, multisets, lists, disjoint union and optionality, i.e. the complex values are indeed trees. The constructors cover the gist of all complex value data models including object oriented databases and XML. Functional and weak functional dependencies are expressed on a lattice of subattributes, which even carries the structure of a Brouwer algebra as long as the union-constructor is absent. Its presence, however, complicates all results and proofs significantly. The reason for this is that the union-constructor causes non-trivial restructuring rules to hold. In particular, if either the set- or the the union-constructor is absent, a subset of the rules is complete for the implication of ordinary functional dependencies, while in the general case no finite axiomatisation for functional dependencies exists
    corecore