42 research outputs found

    Normalization Theory for XML

    Get PDF
    Abstract. Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good – some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon’s information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.

    XML documents schema design

    Get PDF
    The eXtensible Markup Language (XML) is fast emerging as the dominant standard for storing, describing and interchanging data among various systems and databases on the intemet. It offers schema such as Document Type Definition (DTD) or XML Schema Definition (XSD) for defining the syntax and structure of XML documents. To enable efficient usage of XML documents in any application in large scale electronic environment, it is necessary to avoid data redundancies and update anomalies. Redundancy and anomalies in XML documents can lead not only to higher data storage cost but also to increased costs for data transfer and data manipulation.To overcome this problem, this thesis proposes to establish a formal framework of XML document schema design. To achieve this aim, we propose a method to improve and simplify XML schema design by incorporating a conceptual model of the DTD with a theory of database normalization. A conceptual diagram, Graph-Document Type Definition (G-DTD) is proposed to describe the structure of XML documents at the schema level. For G- DTD itself, we define a structure which incorporates attributes, simple elements, complex elements, and relationship types among them. Furthermore, semantic constraints are also precisely defined in order to capture semantic meanings among the defined XML objects.In addition, to provide a guideline to a well-designed schema for XML documents, we propose a set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al. The corresponding normalization rules to transform from a G- DTD into a normal form schema are also discussed. A case study is given to illustrate the applicability of the concept. As a result, we found that the new normal forms are more concise and practical, in particular as they allow the user to find an 'optimal' structure of XML elements/attributes at the schema level. To prove that our approach is applicable for the database designer, we develop a prototype of XML document schema design using a Z formal specification language. Finally, using the same case study, this formal specification is tested to check for correctness and consistency of the specification. Thus, this gives a confidence that our prototype can be implemented successfully to generate an automatic XML schema design

    A new formal and analytical process to product modeling (PPM) method and its application to the precast concrete industry

    Get PDF
    The current standard product (data) modeling process relies on the experience and subjectivity of data modelers who use their experience to eliminate redundancies and identify omissions. As a result, product modeling becomes a social activity that involves iterative review processes of committees. This study aims to develop a new, formal method for deriving product models from data collected in process models of companies within an industry sector. The theoretical goals of this study are to provide a scientific foundation to bridge the requirements collection phase and the logical modeling phase of product modeling and to formalize the derivation and normalization of a product model from the processes it supports. To achieve these goals, a new and formal method, Georgia Tech Process to Product Modeling (GTPPM), has been proposed. GTPPM consists of two modules. The first module is called the Requirements Collection and Modeling (RCM) module. It provides semantics and a mechanism to define a process model, information items used by each activity, and information flow between activities. The logic to dynamically check the consistency of information flow within a process also has been developed. The second module is called the Logical Product Modeling (LPM) module. It integrates, decomposes, and normalizes information constructs collected from a process model into a preliminary product model. Nine design patterns are defined to resolve conflicts between information constructs (ICs) and to normalize the resultant model. These two modules have been implemented as a Microsoft Visio ™ add-on. The tool has been registered and is also called GTPPM ™. The method has been tested and evaluated in the precast concrete sector of the construction industry through several GTPPM modeling efforts. By using GTPPM, a complete set of information items required for product modeling for a medium or a large industry can be collected without generalizing each company's unique process into one unified high-level model. However, the use of GTPPM is not limited to product modeling. It can be deployed in several other areas including: workflow management system or MIS (Management Information System) development software specification development business process re-engineering.Ph.D.Committee Chair: Eastman, Charles M.; Committee Co-Chair: Augenbroe, Godfried; Committee Co-Chair: Navathe, Shamkant B.; Committee Member: Hardwick, Martin; Committee Member: Sacks, Rafae

    Equivalence of Queries with Nested Aggregation

    Get PDF
    Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization. In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation

    A framework for integrating and transforming between ontologies and relational databases

    Get PDF
    Bridging the gap between ontologies, expressed in the Web Ontology Language (OWL), and relational databases is a necessity for realising the Semantic Web vision. Relational databases are considered a good solution for storing and processing ontologies with a large amount of data. Moreover, the vast majority of current websites store data in relational databases, and therefore being able to generate ontologies from such databases is important to support the development of the Semantic Web. Most of the work concerning this topic has either (1) extracted an OWL ontology from an existing relational database that represents as exactly as possible the relational schema, using a limited range of OWL modelling constructs, or (2) extracted a relational database from an existing OWL ontology, that represents as much as possible the OWL ontology. By way of contrast, this thesis proposes a general framework for transforming and mapping between ontologies and databases, via an intermediate low-level Hyper-graph Data Model. The transformation between relational and OWL schemas is expressed using directional Both-As-View mappings, allowing a precise definition of the equivalence between the two schemas, hence data can be mapped back and forth between them. In particular, for a given OWL ontology, we interpret the expressive axioms either as triggers, conforming to the Open-World Assumption, that performs a forward-chaining materialisation of inferred data, or as constraints, conforming to the Closed-World Assumption, that performs a consistency checking. With regards to extracting ontologies from relational databases, we transform a relational database into an exact OWL ontology, then enhance it with rich OWL 2 axioms, using a combination of schema and data analysis. We then apply machine learning algorithms to rank the suggested axioms based on past users’ relevance. A proof-of-concept tool, OWLRel, has been implemented, and a number of well-known ontologies and databases have been used to evaluate the approach and the OWLRel tool.Open Acces
    corecore