13,357 research outputs found

    Coupled schema transformation and data conversion for XML and SQL

    Get PDF
    A two-level data transformation consists of a type-level transformation of a data format coupled with value-level transformations of data instances corresponding to that format. We have implemented a system for performing two-level transformations on XML schemas and their corresponding documents, and on SQL schemas and the databases that they describe. The core of the system consists of a combinator library for composing type-changing rewrite rules that preserve structural information and referential constraints. We discuss the implementation of the system’s core library, and of its SQL and XML front-ends in the functional language Haskell. We show how the system can be used to tackle various two-level transformation scenarios, such as XML schema evolution coupled with document migration, and hierarchical-relational data mappings that convert between XML documents and SQL databases.Fundação para a CiĂȘncia e a Tecnologia (FCT) - POSI/ICHS/44304/2002

    Archival Information Package (AIP) Pilot Specification

    Get PDF
    This report presents the E-ARK AIP format specification as it will be used by the pilots (implementations in pilot organizations). The deliverable is a follow-up version of E-ARK deliverable D4.2. The report describes the structure, metadata, and physical container format of the E-ARK AIP, a container which is the result of converting an E-ARK Submission Information Package (SIP) into the E-ARK Archival Information Package (AIP). The conversion will be implemented in the Integrated Platform as part of the component earkweb

    Open archival information systems for database preservation

    Get PDF
    Tese de mestrado integrado. Engenharia Informåtica e Computação. Universidade do Porto. Faculdade de Engenharia. 201

    Constraint-aware schema transformation

    Get PDF
    Ninth International Workshop on Rule-Based Programming (Rule 2008)Data schema transformations occur in the context of software evolution, refactoring, and cross-paradigm data mappings. When constraints exist on the initial schema, these need to be transformed into constraints on the target schema. Moreover, when high-level data types are refined to lower level structures, additional target schema constraints must be introduced to balance the loss of structure and preserve semantics. We introduce an algebraic approach to schema transformation that is constraint-aware in the sense that constraints are preserved from source to target schemas and that new constraints are introduced where needed. Our approach is based on refinement theory and point-free program transformation. Data refinements are modeled as rewrite rules on types that carry point-free predicates as constraints. At each rewrite step, the predicate on the reduct is computed from the predicate on the redex. An additional rewrite system on point-free functions is used to normalize the predicates that are built up along rewrite chains. We implemented our rewrite systems in a type-safe way in the functional programming language Haskell. We demonstrate their application to constraint-aware hierarchical-relational mappings.FCT -Fundação para a CiĂȘncia e a Tecnologia(SFRH/BD/30215/2006

    Beyond relational databases: preserving the data

    Get PDF
    Relational databases are one of the main technologies supporting information assets in today’s organizations. They are designed to store, organize and retrieve digital information, and are such a fundamental part of information systems that most would not be able to function without them. Very often, the information contained in databases is irreplaceable or prohibitively expensive to reacquire; therefore, steps must be taken to ensure that the information within databases is preserved. This paper describes a methodology for long-term preservation of relational databases based on information extraction and format migration to a preservation format. It also presents a tool that was developed to support this methodology: Database Preservation Toolkit (DBPTK), as well as the processes and formats needed to preserve databases. The DBPTK connects to live relational databases and extracts information into formats more adequate for long-term preservation. Supported preservation formats include the SIARD 2, created by a cooperation between the Swiss Federal Archives and the E-ARK project that is becoming a standard in the area. DBPTK has a flexible plugin-based architecture enabling its use for other purposes like database upgrade and database migration between different systems. Presented real case scenarios demonstrate the usefulness, correctness and performance of the tool.The initial E-ARK project was in part supported by the European Commission within the Competitiveness and Innovation Programme 2007–2013, Grant Agreement no. 620998 under the Policy Support Programme

    Migrating relational databases into object-based and XML databases

    Get PDF
    Rapid changes in information technology, the emergence of object-based and WWW applications, and the interest of organisations in securing benefits from new technologies have made information systems re-engineering in general and database migration in particular an active research area. In order to improve the functionality and performance of existing systems, the re-engineering process requires identifying and understanding all of the components of such systems. An underlying database is one of the most important component of information systems. A considerable body of data is stored in relational databases (RDBs), yet they have limitations to support complex structures and user-defined data types provided by relatively recent databases such as object-based and XML databases. Instead of throwing away the large amount of data stored in RDBs, it is more appropriate to enrich and convert such data to be used by new systems. Most researchers into the migration of RDBs into object-based/XML databases have concentrated on schema translation, accessing and publishing RDB data using newer technology, while few have paid attention to the conversion of data, and the preservation of data semantics, e.g., inheritance and integrity constraints. In addition, existing work does not appear to provide a solution for more than one target database. Thus, research on the migration of RDBs is not fully developed. We propose a solution that offers automatic migration of an RDB as a source into the recent database technologies as targets based on available standards such as ODMG 3.0, SQL4 and XML Schema. A canonical data model (CDM) is proposed to bridge the semantic gap between an RDB and the target databases. The CDM preserves and enhances the metadata of existing RDBs to fit in with the essential characteristics of the target databases. The adoption of standards is essential for increased portability, flexibility and constraints preservation. This thesis contributes a solution for migrating RDBs into object-based and XML databases. The solution takes an existing RDB as input, enriches its metadata representation with the required explicit semantics, and constructs an enhanced relational schema representation (RSR). Based on the RSR, a CDM is generated which is enriched with the RDB's constraints and data semantics that may not have been explicitly expressed in the RDB metadata. The CDM so obtained facilitates both schema translation and data conversion. We design sets of rules for translating the CDM into each of the three target schemas, and provide algorithms for converting RDB data into the target formats based on the CDM. A prototype of the solution has been implemented, which generates the three target databases. Experimental study has been conducted to evaluate the prototype. The experimental results show that the target schemas resulting from the prototype and those generated by existing manual mapping techniques were comparable. We have also shown that the source and target databases were equivalent, and demonstrated that the solution, conceptually and practically, is feasible, efficient and correct

    Extracting, Transforming and Archiving Scientific Data

    Get PDF
    It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201
    • 

    corecore