Search CORE

24 research outputs found

A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

Author: Lu Jiaheng
Yan Zhengtong
Yuan Gongsheng
Publication venue
Publication date: 01/10/2023
Field of study

The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Bridging XML and Relational Databases: An Effective Mapping Scheme based on Persistent

Author: Haw Su-Cheng
Hoong Poo Kuan
Subramaniam Samini
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 02/03/2011
Field of study

XML has emerged as the leading medium for data transfer over the World Wide Web. At the present days, relational database is still widely used as the back-end database in most organizations. Since there is mismatch in these two structures, an effective mapping scheme is definitely essential that provides seamless integration with relational databases. On the other hand, an immutable labeling scheme is certainly significant to dentify the XML nodes uniquely as well as supports dynamic update without having the existing labels to be re-labeled when there is an occurance of dynamic update. As such, in this paper, we propose s-XML by adopting the Persistent Labeling scheme as the annotation scheme to ensure seamless integration with relational database and able to support updates without the need to re-construct the existing labels. We conduct experiments to show that s-XML performs better in terms of mapping the XML nodes to relational databases, query retrieval and dynamic update compared to the existing approaches.DOI:http://dx.doi.org/10.11591/ijece.v2i2.21

Institute of Advanced Engineering and Science

Managing Schema Change in an Heterogeneous Environment

Author: Claypool Kajal Tilak
Publication venue: Digital WPI
Publication date: 17/06/2002
Field of study

Change is inevitable even for persistent information. Effectively managing change of persistent information, which includes the specification, execution and the maintenance of any derived information, is critical and must be addressed by all database systems. Today, for every data model there exists a well-defined set of change primitives that can alter both the structure (the schema) and the data. Several proposals also exist for incrementally propagating a primitive change to any derived information (or view). However, existing support is lacking in two ways. First, change primitives as presented in literature are very limiting in terms of their capabilities allowing users to simply add or remove schema elements. More complex types of changes such the merging or splitting of schema elements are not supported in a principled manner. Second, algorithms for maintaining derived information often do not account for the potential heterogeneity between the source and the target. The goal of this dissertation is to provide solutions that address these two key issues. The first part of this dissertation addresses the challenge of expressing a rich complex set of changes. We propose the SERF (Schema Evolution through an Extensible, Re-usable and Flexible) framework that allows users to perform a wide range of complex user-defined schema transformations. Our approach combines existing schema evolution primitives using OQL (object query language) as the glue logic. Within the context of this work, we look at the different domains in which SERF can be applied, including web site management. To further enrich our framework, we also investigate the optimization and verification of SERF transformations. The second part of this dissertation addresses the problem of maintaining views in the face of source changes when the source and the view are not in the same data model. With today\u27s increasing heterogeneity in information structure, it is critical that maintenance of views addresses the data model boundaries. However, view definitions that go across data models are limited to hard-coded algorithms, thereby making it difficult to develop general maintenance algorithms. We provide a two-step solution for this problem. We have developed a cross algebra, that defines views such that there is no restriction that forces the view and the source data models to be the same. We then define update propagation algorithms that can propagate changes from source to target irrespective of the exact translation and the data models. We validate our ideas by applying them to translation and change propagation between the XML and relational data models

DigitalCommons@WPI

Storing Linked XML documents in Object-Relational DBMS

Author: Akhtar Ali
Nick Rossiter
Pensri Amornsinlaphachai
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2006
Field of study

Currently, several researchers have proposed mapping both structure and constraints of XML documents to an object-relational database (ORDB). However these researches cannot be conducted because of the limited range of constraints in available object-relational DBMSs. We therefore propose mapping rules that are practicable in available technologies. Normally, an XML document is treated as a database so much data redundancy occurs. To solve this problem, we keep non-redundant data in several separate XML documents, link the data dispersed in these documents together by a mechanism called ‘rlink’ and then map this mechanism to ORDB. Finally we perform a case study in Oracle9i to illustrate the mapping of XML to ORDB according to our rules. Our contribution is that we find that mapping linked XML documents to traditional databases such as (O)RDB makes it easier to join several documents and to update several documents in one update command

CiteSeerX

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

From XML to relational database.

Author
Publication venue
Publication date: 01/01/2001
Field of study

by Yan, Men-Hin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 114-119).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Storing XML in Database Systems --- p.2Chapter 1.2 --- Outline of the Thesis --- p.4Chapter 2 --- Related Work --- p.5Chapter 2.1 --- Overview of XML --- p.5Chapter 2.1.1 --- Extensible Markup Language (XML) --- p.5Chapter 2.1.2 --- Data Type Definition (DTD) --- p.6Chapter 2.1.3 --- "ID, IDREF and IDREFS" --- p.9Chapter 2.2 --- Using Special-Purpose Database to Store XML Data --- p.10Chapter 2.3 --- Using Relational Databases to Store XML Data --- p.11Chapter 2.3.1 --- Extracting Schemas with STORED --- p.11Chapter 2.3.2 --- Using Simple Schemes Based on Labeled Graph --- p.12Chapter 2.3.3 --- Generating Schemas from DTDs --- p.12Chapter 2.3.4 --- Commercial Approaches --- p.13Chapter 2.4 --- Discovering Functional Dependencies --- p.14Chapter 2.4.1 --- Functional Dependency --- p.14Chapter 2.4.2 --- Finding Functional Dependencies --- p.14Chapter 2.4.3 --- TANE and Partition Refinement --- p.15Chapter 2.5 --- Multivalued Dependencies --- p.17Chapter 2.5.1 --- Example of Multivalued Dependency --- p.18Chapter 3 --- Using RDBMS to Store XML Data --- p.20Chapter 3.1 --- Global Schema Extraction Algorithm --- p.22Chapter 3.1.1 --- Step 1: Simplify DTD --- p.22Chapter 3.1.2 --- Step 2: Construct Schema Prototype Trees --- p.24Chapter 3.1.3 --- Step 3: Generate Relational Schema Prototype --- p.29Chapter 3.1.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.31Chapter 3.1.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.32Chapter 3.1.6 --- Discussion --- p.32Chapter 3.2 --- DTD-splitting Schema Extraction Algorithm --- p.34Chapter 3.2.1 --- Step 1: Simplify DTD --- p.35Chapter 3.2.2 --- Step 2: Construct Schema Prototype Trees --- p.36Chapter 3.2.3 --- Step 3: Generate Relational Schema Prototype --- p.45Chapter 3.2.4 --- Step 4: Discover Functional Dependencies and Candidate Keys --- p.46Chapter 3.2.5 --- Step 5: Normalize the Relational Schema Prototypes --- p.47Chapter 3.2.6 --- Discussion --- p.49Chapter 3.3 --- Experimental Results --- p.50Chapter 3.3.1 --- Real Life XML Data: SIGMOD Record XML --- p.50Chapter 3.3.2 --- Synthetic XML Data --- p.58Chapter 3.3.3 --- Discussion --- p.68Chapter 4 --- Finding Multivalued Dependencies --- p.75Chapter 4.1 --- Validation of Multivalued Dependencies --- p.77Chapter 4.2 --- Search Strategy and Pruning --- p.80Chapter 4.2.1 --- Search Strategy for Left-hand Sides Candidates --- p.81Chapter 4.2.2 --- Search Strategy for Right-hand Sides Candidates --- p.82Chapter 4.2.3 --- Other Pruning --- p.85Chapter 4.3 --- Computing with Partitions --- p.87Chapter 4.3.1 --- Computing Partitions --- p.88Chapter 4.4 --- Algorithm --- p.89Chapter 4.4.1 --- Generating Next Level Candidates --- p.92Chapter 4.4.2 --- Computing Partitions --- p.93Chapter 4.5 --- Experimental Results --- p.94Chapter 4.5.1 --- Results of the Algorithm --- p.95Chapter 4.5.2 --- Evaluation on the Results --- p.96Chapter 4.5.3 --- Scalability of the Algorithm --- p.98Chapter 4.5.4 --- Using Multivalued Dependencies in Schema Extraction Al- gorithms --- p.101Chapter 5 --- Conclusion --- p.108Chapter 5.1 --- Discussion --- p.108Chapter 5.2 --- Future Work --- p.110Chapter 5.2.1 --- Translate Semistructured Queries to SQL --- p.110Chapter 5.2.2 --- Improve the Multivalued Dependency Discovery Algorithm --- p.112Chapter 5.2.3 --- Incremental Update of Resulting Schema --- p.113Bibliography --- p.113Appendix --- p.120Chapter A --- Simple Proof for Minimality in Multivalued Dependencies --- p.120Chapter B --- Third and Fourth Normal Form Decompositions --- p.122Chapter B.1 --- 3NF Decomposition Algorithm --- p.123Chapter B.2 --- 4NF Decomposition Algorithm --- p.12

CUHK Digital Repository

Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

Author: Alexander Lee
Calder Brian R.
Masetti Giuseppe
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 29/08/2012
Field of study

University of New Brunswick: Centre for Digital Scholarship Journals

UNH Scholars' Repository

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing

Author: I-Chen Wu
Shang-Hsien Hsieh
Publication venue
Publication date: 11/04/2020
Field of study

Abstract: To facilitate information standardization and sharing in Construction Industry, this paper presents a simple but effective approach that maps the UML (Unified Modeling Language) object-oriented information model related to a construction project to an XML schema, then to a Relational DataBase (RDB) schema. First of all, the mapping between UML model and XML schema is discussed since UML has been a popular tool to model the static structure and dynamic behaviors of the information and processes in a construction project, while XML has become a de-facto standard for information sharing and exchange. Then, a set of consistent rules for mapping from XML schema to RDB's Entity-Relational (E-R) model are studied and established since RDB has been the most popular choice for information management. The present study focuses on making the set of rules simple and easy-to-implement for most applications in construction industry. Finally, a mapping tool for automatically generating RDB schemas from XML Schemas is developed

CiteSeerX

Recommended from our members

A flexible approach for mapping between object-oriented databases and xml. A two way method based on an object graph.

Author: Naser Taher A.J.
Publication venue: School of Computing, Informatics and Media
Publication date: 01/01/2011
Field of study

One of the most popular challenges facing academia and industry is the development of effective techniques and tools for maximizing the availability of data as the most valuable source of knowledge. The internet has dominated as the core for maximizing data availability and XML (eXtensible Markup Language) has emerged and is being gradually accepted as the universal standard format for platform independent publishing and exchanging data over the Internet. On the other hand, there remain large amount of data held in structured databases and database management systems have been traditionally used for the effective storage and manipulation of large volumes of data. This raised the need for effective methodologies capable of smoothly transforming data between different formats in general and between XML and structured databases in particular. This dissertation addresses the issue by proposing a two-way mapping approach between XML and object-oriented databases. The basic steps of the proposed approach are applied in a systematic way to produce a graph from the source and then transform the graph into the destination format. In other words, the derived graph summarizes characteristics of the source whether XML (elements and attributes) or object-oriented database (classes, inheritance and nesting hierarchies). Then, the developed methodology classifies nodes and links from the graph into the basic constructs of the destination, i.e., elements and attributes for XML or classes, inheritance and nesting hierarchies for object-oriented databases. The methodology has been successfully implemented and illustrative case studies are presented in this document

Bradford Scholars

Multimodality Data Integration in Epilepsy

Author: Asano Eishi
Chugani Diane C.
Chugani Harry T.
Hua Jing
Lu Shiyong
Lu Yi
Muzik Otto
Zou Guangyu
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2007
Field of study

An important goal of software development in the medical field is the design of methods which are able to integrate information obtained from various imaging and nonimaging modalities into a cohesive framework in order to understand the results of qualitatively different measurements in a larger context. Moreover, it is essential to assess the various features of the data quantitatively so that relationships in anatomical and functional domains between complementing modalities can be expressed mathematically. This paper presents a clinically feasible software environment for the quantitative assessment of the relationship among biochemical functions as assessed by PET imaging and electrophysiological parameters derived from intracranial EEG. Based on the developed software tools, quantitative results obtained from individual modalities can be merged into a data structure allowing a consistent framework for advanced data mining techniques and 3D visualization. Moreover, an effort was made to derive quantitative variables (such as the spatial proximity index, SPI) characterizing the relationship between complementing modalities on a more generic level as a prerequisite for efficient data mining strategies. We describe the implementation of this software environment in twelve children (mean age 5.2 ± 4.3 years) with medically intractable partial epilepsy who underwent both high-resolution structural MR and functional PET imaging. Our experiments demonstrate that our approach will lead to a better understanding of the mechanisms of epileptogenesis and might ultimately have an impact on treatment. Moreover, our software environment holds promise to be useful in many other neurological disorders, where integration of multimodality data is crucial for a better understanding of the underlying disease mechanisms

Directory of Open Access Journals

PubMed Central

Potentially Polluting Marine Sites GEODB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

Author: Alexander Lee
Calder Brian
Masetti Giuseppe
Publication venue: The International Hydrographic Review
Publication date: 29/08/2012
Field of study

Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database.Los sitios marinos potencialmente contaminantes (PPMS) son objetos o zonas de fondos marinos que pueden producir contaminación en el futuro. Se presenta un fundamento para y un diseño de una base de datos geoespacial para hacer un inventario y manipular los PPMS. Creada como una Especificación de Producto de la S-100, se especifica mediante un diagrama UML de fácil lectura y se implementa mediante ficheros GML (de marcaje geográfico) legibles por máquinas, e incluye información auxiliar como recursos para controlar la contaminación y sitios potencialmente vulnerables, para apoyar los análisis de los datos fundamentales. Se presentan el diseño y algunos aspectos de la implementación, junto con los requisitos y la estructura de los metadatos, y una perspectiva sobre los posibles usos de la base de datos.Les sites marins potentiellement polluants (PPMS) sont des objets situés sur le fond marin, ou des zones du fond marin, qui sont susceptibles dans le futur de relâcher de la pollution. La raison d’être et la conception d’une base de données géospatiales visant à inventorier et à manipuler les PPMS sont présentés. Conçue en tant que spécification de produit de la S-100, elle est définie via des diagrammes UML lisibles par l’homme et mise en oeuvre via des fichiers GML lisibles en machine, et elle inclut des renseignements auxiliaires, tels que les ressources anti-pollution et les sites potentiellement vulnérables, aux fins d’appuyer les analyses des données de base. La conception et certains aspects de la mise en oeuvre sont présentés, en même temps que les exigences et la structure des métadonnées, et une perspective sur les utilisations potentielles de la base de données

University of New Brunswick: Centre for Digital Scholarship Journals