2,990 research outputs found

    Inferring functional dependencies for XML storage

    Get PDF
    XML allows redundancy of data with its hierarchical structure where its elements may be nested and repeated. This will make the same information appear in more than one place; in fact it allows the same elements appear at different sub-trees. With this capability, XML is easier to understand and to parse, while to recover this information would require less joins. This is in contrast to relational data for which the normalized theory has been developed for eliminating data redundancy. Therefore how to detect redundancy in XML data is important before mapping can be done. In this paper, we use functional dependencies to detect data redundancies in XML documents. Based on inferring other functional dependencies from the given ones, we proposed an algorithm for mapping XML DTDs to relational schemas. The result is a “good relational schema” in terms of reducing data redundancy and preserving the semantic constraints

    XML document design via GN-DTD

    Get PDF
    Designing a well-structured XML document is important for the sake of readability and maintainability. More importantly, this will avoid data redundancies and update anomalies when maintaining a large quantity of XML based documents. In this paper, we propose a method to improve XML structural design by adopting graphical notations for Document Type Definitions (GN-DTD), which is used to describe the structure of an XML document at the schema level. Multiples levels of normal forms for GN-DTD are proposed on the basis of conceptual model approaches and theories of normalization. The normalization rules are applied to transform a poorly designed XML document into a well-designed based on normalized GN-DTD, which is illustrated through examples

    Data integration for XML based on semantic knowledge

    Get PDF
    Reconciling of knowledge from multiple heterogeneous data sources has been a major focus of database research for more than a decade.As a standard for exchanging business data on the WWW, XML should provide the ability of expressing data and semantics among them. Since most of application data are stored in relational databases due to its popularity and rich development experiences over it.Therefore, how to provide a proper mapping approach from relational model to XML model becomes the major research problem in the field of current information exchanging, sharing and integration..The model needs to be integrated and at the same time maintain the semantic knowledge among the data. The aim of this paper is to provide an overview for XML based data integration on semantic knowledge.At the end of the paper, we review some methodologies from existing literature

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    Designing information-preserving mapping schemes for XML

    Get PDF
    Journal ArticleAn XML-to-relational mapping scheme consists of a procedure for shredding XML documents into relational databases, a procedure for publishing databases back as documents, and a set of constraints the databases must satisfy. In previous work, we discussed two notions of information preservation for mapping schemes: losslessness, which guarantees the complete reconstruction of a document from a database; and validation, which guarantees that every update to a database corresponding to a valid document results in a database corresponding to another valid document. Also, we described one information preserving mapping scheme, called Edge++, and showed that, under reasonable assumptions, lossless and validation are both undecidable. This leads to the question we study in this paper: how to design information-preserving mapping schemes. We propose to do it by starting with a scheme known to be information preserving (such as Edge++) and applying to it equivalence-preserving transformations written in weakly recursive ILOG. We study a particular incarnation of this framework, the LILO algorithm, and show that it provides signfii cant performance improvements over Edge++ and that the constraints it introduces are efficiently enforced in practice

    Constraint-aware schema transformation

    Get PDF
    Ninth International Workshop on Rule-Based Programming (Rule 2008)Data schema transformations occur in the context of software evolution, refactoring, and cross-paradigm data mappings. When constraints exist on the initial schema, these need to be transformed into constraints on the target schema. Moreover, when high-level data types are refined to lower level structures, additional target schema constraints must be introduced to balance the loss of structure and preserve semantics. We introduce an algebraic approach to schema transformation that is constraint-aware in the sense that constraints are preserved from source to target schemas and that new constraints are introduced where needed. Our approach is based on refinement theory and point-free program transformation. Data refinements are modeled as rewrite rules on types that carry point-free predicates as constraints. At each rewrite step, the predicate on the reduct is computed from the predicate on the redex. An additional rewrite system on point-free functions is used to normalize the predicates that are built up along rewrite chains. We implemented our rewrite systems in a type-safe way in the functional programming language Haskell. We demonstrate their application to constraint-aware hierarchical-relational mappings.FCT -Fundação para a Ciência e a Tecnologia(SFRH/BD/30215/2006

    Managing Schema Change in an Heterogeneous Environment

    Get PDF
    Change is inevitable even for persistent information. Effectively managing change of persistent information, which includes the specification, execution and the maintenance of any derived information, is critical and must be addressed by all database systems. Today, for every data model there exists a well-defined set of change primitives that can alter both the structure (the schema) and the data. Several proposals also exist for incrementally propagating a primitive change to any derived information (or view). However, existing support is lacking in two ways. First, change primitives as presented in literature are very limiting in terms of their capabilities allowing users to simply add or remove schema elements. More complex types of changes such the merging or splitting of schema elements are not supported in a principled manner. Second, algorithms for maintaining derived information often do not account for the potential heterogeneity between the source and the target. The goal of this dissertation is to provide solutions that address these two key issues. The first part of this dissertation addresses the challenge of expressing a rich complex set of changes. We propose the SERF (Schema Evolution through an Extensible, Re-usable and Flexible) framework that allows users to perform a wide range of complex user-defined schema transformations. Our approach combines existing schema evolution primitives using OQL (object query language) as the glue logic. Within the context of this work, we look at the different domains in which SERF can be applied, including web site management. To further enrich our framework, we also investigate the optimization and verification of SERF transformations. The second part of this dissertation addresses the problem of maintaining views in the face of source changes when the source and the view are not in the same data model. With today\u27s increasing heterogeneity in information structure, it is critical that maintenance of views addresses the data model boundaries. However, view definitions that go across data models are limited to hard-coded algorithms, thereby making it difficult to develop general maintenance algorithms. We provide a two-step solution for this problem. We have developed a cross algebra, that defines views such that there is no restriction that forces the view and the source data models to be the same. We then define update propagation algorithms that can propagate changes from source to target irrespective of the exact translation and the data models. We validate our ideas by applying them to translation and change propagation between the XML and relational data models

    Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

    Get PDF
    Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database
    corecore