18,559 research outputs found

    Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

    Get PDF
    1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

    A data cube model for analysis of high volumes of ambient data

    Get PDF
    Ambient systems generate large volumes of data for many of their application areas with XML often the format for data exchange. As a result, large scale ambient systems such as smart cities require some form of optimization before different components can merge their data streams. In data warehousing, the cube structure is often used for optimizing the analytics process with more recent structures such as dwarf, providing new orders of magnitude in terms of optimizing data extraction. However, these systems were developed for relational data and as a result, we now present the development of an XML dwarf to manage ambient systems generating XML data

    Tractable XML data exchange via relations

    Get PDF
    We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and XML data exchange, which indicates that restrictions have to be imposed on XML schemas and mappings, and on XML shredding schemes, to make the use of relational systems possible. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the inlining technique for dataexchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system

    PATAXÓ: A Framework to Allow Updates Through XML Views

    Get PDF
    XML has become an important medium for data exchange, and is frequently used as an interface to (i.e., a view of) a relational database. Although a lot of work has been done on querying relational databases through XML views, the problem of updating relational databases through XML views has not received much attention. In this work, we map XML views expressed using a subset of XQuery to a corresponding set of relational views. Thus, we transform the problem of updating relational databases through XML views into a classical problem of updating relational databases through relational views. We then show how updates on the XML view are mapped to updates on the corresponding relational views. Existing work on updating relational views can then be leveraged to determine whether or not the relational views are updatable with respect to the relational updates, and if so, to translate the updates to the underlying relational database

    LegoDB: customizing relational storage for XML documents

    Get PDF
    Journal ArticleXML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). Not only is an increasing amount of XML data now being processed, but XML is also increasingly being used in business-critical applications. Efficient and reliable storage is an important requirement for these applications. By relying on relational engines for this purpose, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors

    Streamlining the CERIF XML data exchange format: towards CERIF 2.0.

    Get PDF
    The Common European Research Information Format (CERIF) is an established standard for Current Research Information Systems (CRISs) facing the increasing need for information sharing and exchange. euroCRIS released the first official CERIF XML exchange format in 2007; it followed the structure of the relational data model. Based on experience with the format and consulting with the CRIS community on newer interoperation and exchange concepts, the authors proposed an update to the CERIF XML exchange format. This updated CERIF XML aimed at compactness with expression and backwards compatibility. With this article, we provide insight into the motivation for change, present the updated format, and finally outline possible next steps

    Hybrid XML Data Model Architecture for Efficient Document Management

    Get PDF
    XML has been known as a document standard in representation and exchange of data on the Internet, and is also used as a standard language for the search and reuse of scattered documents on the Internet. The issues related to XML are how to model data on effective and efficient management of semi-structured data and how to actually store the modeled data when implementing a XML contents management system. Previous researches on XML have limitations in (1) reproduction of XML documents from the stored data, (2) retrieval of XML sub-graph from search, (3) supporting only top-down search, not full-search, and (4) dependency of data structure on XML documents. The purpose of this paper is to present a hybrid XML data model architecture for the storage and search of XML document data. By representing both data and structure views of XML documents, this new XML data model technique overcomes the limitations of previous researches on data model for XML documents as well as the existing database systems such as relational and object-oriented data model

    Bulkloading and Maintaining XML Documents

    Get PDF
    The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary. Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated. We implemented our ideas on top of the Monet Database System and benchmarked their performance
    corecore