24 research outputs found

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    Bridging XML and Relational Databases: An Effective Mapping Scheme based on Persistent

    Get PDF
    XML has emerged as the leading medium for data transfer over the World Wide Web. At the present days, relational database is still widely used as the back-end database in most organizations. Since there is mismatch in these two structures, an effective mapping scheme is definitely essential that provides seamless integration with relational databases. On the other hand, an immutable labeling scheme is certainly significant to dentify the XML nodes uniquely as well as supports dynamic update without having the existing labels to be re-labeled when there is an occurance of dynamic update. As such, in this paper, we propose s-XML by adopting the Persistent Labeling scheme as the annotation scheme to ensure seamless integration with relational database and able to support updates without the need to re-construct the existing labels. We conduct experiments to show that s-XML performs better in terms of mapping the XML nodes to relational databases, query retrieval and dynamic update compared to the existing approaches.DOI:http://dx.doi.org/10.11591/ijece.v2i2.21

    Managing Schema Change in an Heterogeneous Environment

    Get PDF
    Change is inevitable even for persistent information. Effectively managing change of persistent information, which includes the specification, execution and the maintenance of any derived information, is critical and must be addressed by all database systems. Today, for every data model there exists a well-defined set of change primitives that can alter both the structure (the schema) and the data. Several proposals also exist for incrementally propagating a primitive change to any derived information (or view). However, existing support is lacking in two ways. First, change primitives as presented in literature are very limiting in terms of their capabilities allowing users to simply add or remove schema elements. More complex types of changes such the merging or splitting of schema elements are not supported in a principled manner. Second, algorithms for maintaining derived information often do not account for the potential heterogeneity between the source and the target. The goal of this dissertation is to provide solutions that address these two key issues. The first part of this dissertation addresses the challenge of expressing a rich complex set of changes. We propose the SERF (Schema Evolution through an Extensible, Re-usable and Flexible) framework that allows users to perform a wide range of complex user-defined schema transformations. Our approach combines existing schema evolution primitives using OQL (object query language) as the glue logic. Within the context of this work, we look at the different domains in which SERF can be applied, including web site management. To further enrich our framework, we also investigate the optimization and verification of SERF transformations. The second part of this dissertation addresses the problem of maintaining views in the face of source changes when the source and the view are not in the same data model. With today\u27s increasing heterogeneity in information structure, it is critical that maintenance of views addresses the data model boundaries. However, view definitions that go across data models are limited to hard-coded algorithms, thereby making it difficult to develop general maintenance algorithms. We provide a two-step solution for this problem. We have developed a cross algebra, that defines views such that there is no restriction that forces the view and the source data models to be the same. We then define update propagation algorithms that can propagate changes from source to target irrespective of the exact translation and the data models. We validate our ideas by applying them to translation and change propagation between the XML and relational data models

    Storing Linked XML documents in Object-Relational DBMS

    Get PDF
    Currently, several researchers have proposed mapping both structure and constraints of XML documents to an object-relational database (ORDB). However these researches cannot be conducted because of the limited range of constraints in available object-relational DBMSs. We therefore propose mapping rules that are practicable in available technologies. Normally, an XML document is treated as a database so much data redundancy occurs. To solve this problem, we keep non-redundant data in several separate XML documents, link the data dispersed in these documents together by a mechanism called ‘rlink’ and then map this mechanism to ORDB. Finally we perform a case study in Oracle9i to illustrate the mapping of XML to ORDB according to our rules. Our contribution is that we find that mapping linked XML documents to traditional databases such as (O)RDB makes it easier to join several documents and to update several documents in one update command

    From XML to relational database.

    Get PDF
    by Yan, Men-Hin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 114-119).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Storing XML in Database Systems --- p.2Chapter 1.2 --- Outline of the Thesis --- p.4Chapter 2 --- Related Work --- p.5Chapter 2.1 --- Overview of XML --- p.5Chapter 2.1.1 --- Extensible Markup Language (XML) --- p.5Chapter 2.1.2 --- Data Type Definition (DTD) --- p.6Chapter 2.1.3 --- "ID, IDREF and IDREFS" --- p.9Chapter 2.2 --- Using Special-Purpose Database to Store XML Data --- p.10Chapter 2.3 --- Using Relational Databases to Store XML Data --- p.11Chapter 2.3.1 --- Extracting Schemas with STORED --- p.11Chapter 2.3.2 --- Using Simple Schemes Based on Labeled Graph --- p.12Chapter 2.3.3 --- Generating Schemas from DTDs --- p.12Chapter 2.3.4 --- Commercial Approaches --- p.13Chapter 2.4 --- Discovering Functional Dependencies --- p.14Chapter 2.4.1 --- Functional Dependency --- p.14Chapter 2.4.2 --- Finding Functional Dependencies --- p.14Chapter 2.4.3 --- TANE and Partition Refinement --- p.15Chapter 2.5 --- Multivalued Dependencies --- p.17Chapter 2.5.1 --- Example of Multivalued Dependency --- p.18Chapter 3 --- Using RDBMS to Store XML Data --- p.20Chapter 3.1 --- Global Schema Extraction Algorithm --- p.22Chapter 3.1.1 --- Step 1: Simplify DTD --- p.22Chapter 3.1.2 --- Step 2: Construct Schema Prototype Trees --- p.24Chapter 3.1.3 --- Step 3: Generate Relational Schema Prototype --- p.29Chapter 3.1.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.31Chapter 3.1.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.32Chapter 3.1.6 --- Discussion --- p.32Chapter 3.2 --- DTD-splitting Schema Extraction Algorithm --- p.34Chapter 3.2.1 --- Step 1: Simplify DTD --- p.35Chapter 3.2.2 --- Step 2: Construct Schema Prototype Trees --- p.36Chapter 3.2.3 --- Step 3: Generate Relational Schema Prototype --- p.45Chapter 3.2.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.46Chapter 3.2.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.47Chapter 3.2.6 --- Discussion --- p.49Chapter 3.3 --- Experimental Results --- p.50Chapter 3.3.1 --- Real Life XML Data: SIGMOD Record XML --- p.50Chapter 3.3.2 --- Synthetic XML Data --- p.58Chapter 3.3.3 --- Discussion --- p.68Chapter 4 --- Finding Multivalued Dependencies --- p.75Chapter 4.1 --- Validation of Multivalued Dependencies --- p.77Chapter 4.2 --- Search Strategy and Pruning --- p.80Chapter 4.2.1 --- Search Strategy for Left-hand Sides Candidates --- p.81Chapter 4.2.2 --- Search Strategy for Right-hand Sides Candidates --- p.82Chapter 4.2.3 --- Other Pruning --- p.85Chapter 4.3 --- Computing with Partitions --- p.87Chapter 4.3.1 --- Computing Partitions --- p.88Chapter 4.4 --- Algorithm --- p.89Chapter 4.4.1 --- Generating Next Level Candidates --- p.92Chapter 4.4.2 --- Computing Partitions --- p.93Chapter 4.5 --- Experimental Results --- p.94Chapter 4.5.1 --- Results of the Algorithm --- p.95Chapter 4.5.2 --- Evaluation on the Results --- p.96Chapter 4.5.3 --- Scalability of the Algorithm --- p.98Chapter 4.5.4 --- Using Multivalued Dependencies in Schema Extraction Al- gorithms --- p.101Chapter 5 --- Conclusion --- p.108Chapter 5.1 --- Discussion --- p.108Chapter 5.2 --- Future Work --- p.110Chapter 5.2.1 --- Translate Semistructured Queries to SQL --- p.110Chapter 5.2.2 --- Improve the Multivalued Dependency Discovery Algorithm --- p.112Chapter 5.2.3 --- Incremental Update of Resulting Schema --- p.113Bibliography --- p.113Appendix --- p.120Chapter A --- Simple Proof for Minimality in Multivalued Dependencies --- p.120Chapter B --- Third and Fourth Normal Form Decompositions --- p.122Chapter B.1 --- 3NF Decomposition Algorithm --- p.123Chapter B.2 --- 4NF Decomposition Algorithm --- p.12

    Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

    Get PDF
    Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database

    An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing

    Get PDF
    Abstract: To facilitate information standardization and sharing in Construction Industry, this paper presents a simple but effective approach that maps the UML (Unified Modeling Language) object-oriented information model related to a construction project to an XML schema, then to a Relational DataBase (RDB) schema. First of all, the mapping between UML model and XML schema is discussed since UML has been a popular tool to model the static structure and dynamic behaviors of the information and processes in a construction project, while XML has become a de-facto standard for information sharing and exchange. Then, a set of consistent rules for mapping from XML schema to RDB's Entity-Relational (E-R) model are studied and established since RDB has been the most popular choice for information management. The present study focuses on making the set of rules simple and easy-to-implement for most applications in construction industry. Finally, a mapping tool for automatically generating RDB schemas from XML Schemas is developed

    Multimodality Data Integration in Epilepsy

    Get PDF
    An important goal of software development in the medical field is the design of methods which are able to integrate information obtained from various imaging and nonimaging modalities into a cohesive framework in order to understand the results of qualitatively different measurements in a larger context. Moreover, it is essential to assess the various features of the data quantitatively so that relationships in anatomical and functional domains between complementing modalities can be expressed mathematically. This paper presents a clinically feasible software environment for the quantitative assessment of the relationship among biochemical functions as assessed by PET imaging and electrophysiological parameters derived from intracranial EEG. Based on the developed software tools, quantitative results obtained from individual modalities can be merged into a data structure allowing a consistent framework for advanced data mining techniques and 3D visualization. Moreover, an effort was made to derive quantitative variables (such as the spatial proximity index, SPI) characterizing the relationship between complementing modalities on a more generic level as a prerequisite for efficient data mining strategies. We describe the implementation of this software environment in twelve children (mean age 5.2 ± 4.3 years) with medically intractable partial epilepsy who underwent both high-resolution structural MR and functional PET imaging. Our experiments demonstrate that our approach will lead to a better understanding of the mechanisms of epileptogenesis and might ultimately have an impact on treatment. Moreover, our software environment holds promise to be useful in many other neurological disorders, where integration of multimodality data is crucial for a better understanding of the underlying disease mechanisms

    Potentially Polluting Marine Sites GEODB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

    Get PDF
    Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database.Los sitios marinos potencialmente contaminantes (PPMS) son objetos o zonas de fondos marinos que pueden producir contaminación en el futuro. Se presenta un fundamento para y un diseño de una base de datos geoespacial para hacer un inventario y manipular los PPMS. Creada como una Especificación de Producto de la S-100, se especifica mediante un diagrama UML de fácil lectura y se implementa mediante ficheros GML (de marcaje geográfico) legibles por máquinas, e incluye información auxiliar como recursos para controlar la contaminación y sitios potencialmente vulnerables, para apoyar los análisis de los datos fundamentales. Se presentan el diseño y algunos aspectos de la implementación, junto con los requisitos y la estructura de los metadatos, y una perspectiva sobre los posibles usos de la base de datos.Les sites marins potentiellement polluants (PPMS) sont des objets situés sur le fond marin, ou des zones du fond marin, qui sont susceptibles dans le futur de relâcher de la pollution. La raison d’être et la conception d’une base de données géospatiales visant à inventorier et à manipuler les PPMS sont présentés. Conçue en tant que spécification de produit de la S-100, elle est définie via des diagrammes UML lisibles par l’homme et mise en oeuvre via des fichiers GML lisibles en machine, et elle inclut des renseignements auxiliaires, tels que les ressources anti-pollution et les sites potentiellement vulnérables, aux fins d’appuyer les analyses des données de base. La conception et certains aspects de la mise en oeuvre sont présentés, en même temps que les exigences et la structure des métadonnées, et une perspective sur les utilisations potentielles de la base de données
    corecore