39,670 research outputs found

    Efficient storage of XML data

    Full text link
    We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim

    Relational Approach to Logical Query Optimization of XPath

    Get PDF
    To be able to handle the ever growing volumes of XML documents, effective and efficient data management solutions are needed. Managing XML data in a relational DBMS has great potential. Recently, effective relational storage schemes and index structures have been proposed as well as special-purpose join operators to speed up querying of XML data using XPath/XQuery. In this paper, we address the topic of query plan construction and logical query optimization. The claim of this paper is that standard relational algebra extended with special-purpose join operators suffices for logical query optimization. We focus on the XPath accelerator storage scheme and associated staircase join operators, but the approach can be generalized easily

    Generating Nested XML Documents with Dtd from Relational Views

    Get PDF
    Converting relational database into XML is increasing daily for publishing and exchanging data on the web. Most of the current approaches and tools for generating XML documents from relational database generate flat XML documents that contain data redundancy which leads to produce a massive data on the web. Other approaches assume that the relational database for generating nested XML documents is normalized. In addition, these approaches have problem that lies in the difficult of how to specify the parent elements from the children elements in the nested XML document. Moreover, most of the current approaches and tools do not generate nested XML documents automatically. They require the user to specify the constraints and the schema of the target document. This research proposes an approach to automatically generate nested XML documents from flat relational database views that are unnormalized. The research aims to reduce data redundancy and storage sizes for the generated XML documents. The proposed approach consists of three steps. The first step is converting flat relational view into nested relational view. The second is generating DTD from the nested relational view. The third is generating nested XML document from the nested relational view. The proposed approach is evaluated and compared to other approaches such as NeT, CoT, and Cost-Based and tools such as Allora, Altova, and DbToXml with respect to two measurements: data redundancy and storage size of the document. The first measurement includes several parameters that are number of data values, elements, attributes, and tags. Based on the results of comparing the proposed approach to several other approaches and tools, the proposed approach is more efficient for reducing data redundancy and storage size of XML documents. It can reduce data redundancy and storage size by approximately 50% and 55%, respectively

    Hybrid XML Data Model Architecture for Efficient Document Management

    Get PDF
    XML has been known as a document standard in representation and exchange of data on the Internet, and is also used as a standard language for the search and reuse of scattered documents on the Internet. The issues related to XML are how to model data on effective and efficient management of semi-structured data and how to actually store the modeled data when implementing a XML contents management system. Previous researches on XML have limitations in (1) reproduction of XML documents from the stored data, (2) retrieval of XML sub-graph from search, (3) supporting only top-down search, not full-search, and (4) dependency of data structure on XML documents. The purpose of this paper is to present a hybrid XML data model architecture for the storage and search of XML document data. By representing both data and structure views of XML documents, this new XML data model technique overcomes the limitations of previous researches on data model for XML documents as well as the existing database systems such as relational and object-oriented data model

    LegoDB: customizing relational storage for XML documents

    Get PDF
    Journal ArticleXML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). Not only is an increasing amount of XML data now being processed, but XML is also increasingly being used in business-critical applications. Efficient and reliable storage is an important requirement for these applications. By relying on relational engines for this purpose, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors

    A model for querying semistructured data through the exploitation of regular sub-structures

    Get PDF
    Much research has been undertaken in order to speed up the processing of semistructured data in general and XML in particular. Many approaches for storage, compression, indexing and querying exist, e.g. [1, 2]. We do not present yet another such algorithm but a unifying model in which these algorithm can be understood. The key idea behind this research is the assumption, that most practical queries are based on a particular pattern of data that can be deduced from the query and which can then be captured using a regular structure amendable to efficient processing techniques

    Applying OGC sensor web enablement to ocean observing systems

    Get PDF
    The complexity of marine installations for ocean observing systems has grown significantly in recent years. In a network consisting of tens, hundreds or thousands of marine instruments, manual configuration and integration becomes very challenging. Simplifying the integration process in existing or newly established observing systems would benefit system operators and is important for the broader application of different sensors. This article presents an approach for the automatic configuration and integration of sensors into an interoperable Sensor Web infrastructure. First, the sensor communication model, based on OGC's SensorML standard, is utilized. It serves as a generic driver mechanism since it enables the declarative and detailed description of a sensor's protocol. Finally, we present a data acquisition architecture based on the OGC PUCK protocol that enables storage and retrieval of the SensorML document from the sensor itself, and automatic integration of sensors into an interoperable Sensor Web infrastructure. Our approach adopts Efficient XML Interchange (EXI) as alternative serialization form of XML or JSON. It solves the bandwidth problem of XML and JSON.Peer ReviewedPostprint (author's final draft

    Mining XML documents with association rule algorithms

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2008Includes bibliographical references (leaves: 59-63)Text in English; Abstract: Turkish and Englishx, 63 leavesFollowing the increasing use of XML technology for data storage and data exchange between applications, the subject of mining XML documents has become more researchable and important topic. In this study, we considered the problem of Mining Association Rules between items in XML document. The principal purpose of this study is applying association rule algorithms directly to the XML documents with using XQuery which is a functional expression language that can be used to query or process XML data. We used three different algorithms; Apriori, AprioriTid and High Efficient AprioriTid. We give comparisons of mining times of these three apriori-like algorithms on XML documents using different support levels, different datasets and different dataset sizes

    Bulkloading and Maintaining XML Documents

    Get PDF
    The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary. Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated. We implemented our ideas on top of the Monet Database System and benchmarked their performance

    Efficient cube construction for smart city data

    Get PDF
    To deliver powerful smart city environments, there is a requirement to analyse web produced data streams in close to real time so that city planners can employ up to date predictive models in both short and long term planning. Data cubes, fused from multiple sources provide a popular input to predictive models. A key component in this infrastructure is an efficient mechanism for transforming web data (XML or JSON) into multi-dimensional cubes. In our research, we have developed a framework for efficient transformation of XML data from multiple smart city services into DWARF cubes using a NoSQL storage engine. Our evaluation shows a high level of performance when compared to other approaches and thus, provides a platform for predictive models in a smart city environment
    corecore