258 research outputs found

    A fractional number based labeling scheme for dynamic XML updating

    Get PDF
    Recently, XML query processing based on labeling schemes has been proposed.Based on labeling schemes, the structural relationship between XML nodes can be determined quickly without the need of accessing the XML document.However, labeling schemes have to re label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during the update process.In this paper, we propose a novel labeling scheme based on fractional numbers.The key feature of fractional numbers is that infinite number of fractional numbers can be inserted between any two unequal fractional numbers.Therefore, the problem of re-labeling the pre-existing nodes during the XML updating can be solved if the XML nodes are label by the fractional numbers

    Creating a Representation of Items and Version that Support Efficient Evaluation of the Transaction-Time Axis in XML-Based Databases

    Get PDF
    This project was developed to create a platform for implementing the features and query support provided by the transaction time axis (tt-axis). The basis for this platform is a new numbering plan called item version timestamp level numbering (IVTLN), and it extends an existing numbering plan, namely, dewey level numbering (DLN), by including version and timestamp information. Thus, the transaction time axis provides a temporal perspective for XML nodes in addition to non-temporal axes like the ancestor and descendant axes. This project provides an efficient, extensible, and comprehensible platform for the implementation of the new numbering plan and the services provided by the transaction time axis

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein

    Cloud Computing Strategies for Enhancing Smart Grid Performance in Developing Countries

    Get PDF
    In developing countries, the awareness and development of Smart Grids are in the introductory stage and the full realisation needs more time and effort. Besides, the partially introduced Smart Grids are inefficient, unreliable, and environmentally unfriendly. As the global economy crucially depends on energy sustainability, there is a requirement to revamp the existing energy systems. Hence, this research work aims at cost-effective optimisation and communication strategies for enhancing Smart Grid performance on Cloud platforms

    Compressed self-indexed XML representation with efficient XPath evaluation

    Get PDF
    [Abstract] The popularity of the eXtensible Markup Language (XML) has been continuously growing since its first introduction, being today acknowledged as the de facto standard for semi-structured data representation and data exchange on the World Wide Web. In this scenario, several query languages were proposed to exploit the expressiveness of XML data, as well as systems to provide an eficient support. At the same time, as research in compression became more and more relevant, works also focused their efforts on studying new approaches to provide eficient solutions, using the minimum amount of space. Today, however, there is a lack of practical available tools that join both eficient query support, and minimum space requirements. In this thesis we address this problem, and propose a new approach for storing, processing and querying XML documents in time and space eficient way, by specially focusing on XPath queries. We have developed a new compressed selfindexed representation of XML documents that obtains compression ratios about 30%-40%, over which a query module providing eficient XPath query evaluation has also been developed. As a whole, both parts make up a complete system, we called XXS, for the eficient evaluation of XPath queries over compressed self-indexed XML documents. Experimental results show the outstanding performance of our proposal, which can successfully compete with some of the best-known solutions, and that largely outperforms them in terms of space.[Resumo] A popularidade do eXtensible Markup Language (XML) non fixo máis que medrar dende a súa introdución inicial, sendo recoñecido hoxe en día como o estándar de facto para a representación de datos semi-estruturados e o intercambio de datos na Rede. Baixo este escenario, son varias as linguaxes de consulta que se propuxeron para explotar a expresividade dos datos en formato XML, así como sistemas que proporcionasen un soporte eficiente a eles. Ó mesmo tempo, e conforme a investigación en compresión se fixo cada vez máis relevante, os esforzos tamén foron dirixidos a estudiar novas aproximacións que ofrecesen solucións eficientes, pero usando ademáis a menor cantidade de espacio posible. Actualmente, sen embargo, existe unha clara ausencia de ferramentas prácticas dispoñibles que agrupen ambas características: un soporte á realización de consultas eficiente, xunto con requisitos de espacio mínimos. Nesta tese abordamos ese problema, e propoñemos unha nova solución para o almacenamento, procesamento e consulta de documentos XML, eficiente tanto en tempo como en espacio, centrándonos, en particular, na linguaxe de consulta XPath. Así, desenvolvimos unha nova representación comprimida e auto-indexada de documentos XML, que obtén ratios de compresión en torno ó 30%-40%, e sobre a cal se creou tamén un módulo de consulta para a eficiente evaluación de consultas XPath. En conxunto, ambas contribucións conforman un sistema completo, que chamamos XXS, para a evaluación eficiente de consultas XPath sobre documentos XML comprimidos e auto-indexados. Os resultados experimentais amosan o destacado comportamento da nosa ferramenta, que é capaz de competir exitosamente con algunhas das solucións máis coñecidas, ás que ademáis supera claramente en termos de espacio.[Resumen] La popularidad del eXtensible Markup Language (XML) no ha hecho sino más que ir en aumento desde su introducción inicial, siendo hoy día reconocido como el estándar de facto para la representación de datos semi-estructurados, y el intercambio de datos en Internet. Bajo este escenario, son varios los lenguajes de consulta que se han venido proponiendo para explotar la expresividad de los datos en formato XML, así como sistemas que proporcionasen un soporte eficiente a ellos. Al mismo tiempo, y conforme la investigación en compresión se ha hecho cada vez más relevante, los esfuerzos se han dirigido también a estudiar nuevas aproximaciones que ofreciesen soluciones eficientes, pero usando además la menor cantidad de espacio posible. Actualmente, sin embargo, existe una clara ausencia de herramientas prácticas disponibles que aúnen ambas características: un soporte a la realización de consultas eficiente, con requisitos de espacio mínimos. En esta tesis abordamos ese problema, y proponemos una nueva solución para el almacenamiento, procesamiento y consulta de documentos XML, eficiente en tiempo y en espacio, centrándonos, en particular, en el lenguaje de consulta XPath. Así, hemos desarrollado una nueva representación comprimida y auto-indexada de documentos XML, que obtiene ratios de compresión del 30%-40%, y sobre la cual se ha creado un módulo de consulta para la eficiente evaluación de consultas XPath. En conjunto, ambas contribuciones conforman un sistema completo, que hemos dado en llamar XXS, para la evaluación eficiente de consultas XPath sobre documentos XML comprimidos y auto-indexados. Los resultados experimentales evidencian el destacado comportamiento de nuestra herramienta, que es capaz de competir exitosamente con algunas de las soluciones más conocidas, a las que además supera claramente en términos de espacio

    EFFICIENT LAYOUTS AND ALGORITHMS FOR MANAGING VERSIONED DATASETS

    Get PDF
    Version Control Systems were primarily designed to keep track of and provide control over changes to source code and have since provided an excellent way to combat the problem of sharing and editing files in a collaborative setting. The recent surge in data-driven decision making has resulted in a proliferation of datasets elevating them to the level of source code which in turn has led the data analysts to resort to version control systems for the purpose of storing and managing datasets and their versions over time. Unfortunately existing version control systems are poor at handling large datasets primarily due to the underlying assumption that the stored files are relatively small text files with localized changes. Moreover the algorithms used by these systems tend to be fairly simple leading to suboptimal performance when applied to large datasets. In order to address the shortcomings, a key requirement here is to have a Dataset Version Control System (DVCS) that will serve as a common platform to enable data analysts to efficiently store and query dataset versions, track changes to datasets and share datasets between users at ease. Towards this goal, we address the fundamental problem of designing storage layouts for a wide range of datasets to serve as the primary building block for an efficient and scalable DVCS. The key problem in this setting is to compactly store a large number of dataset versions and efficiently retrieve any specific version (or a collection of partial versions). We initiate our study by considering storage-retrieval trade-offs for versions of unstructured dataset such as text files, blobs, etc. where the notion of a partial version is not well-defined. Next, we consider array datasets, i.e., a collection of temporal snapshots (or versions) of multi-dimensional arrays, where the data is predominantly represented in single precision or double precision format. The primary challenge here is to develop efficient compression techniques for the hard-to-compress floating point data due to the high degree of entropy. We observe that the underlying techniques developed for unstructured or array datasets are not well suited for more structured dataset versions -- a version in this setting is defined by a collection of records each of which is uniquely addressable. We carefully explore the design space for building such a system and the various storage-retrieval trade-offs, and discuss how different storage layouts influence those trade-offs. Next, we formulate several problems trading off the version storage and retrieval cost in various ways and design several offline storage layout algorithms that effectively minimize the storage costs while keeping the retrieval costs low. In addition to version retrieval queries, our system also provides support for record provenance queries. Through extensive experiments on large datasets, we demonstrate that our proposed designs can operate at the scale required in most practical scenarios

    Probabilistic uncertainty in an interoperable framework

    Get PDF
    This thesis provides an interoperable language for quantifying uncertainty using probability theory. A general introduction to interoperability and uncertainty is given, with particular emphasis on the geospatial domain. Existing interoperable standards used within the geospatial sciences are reviewed, including Geography Markup Language (GML), Observations and Measurements (O&M) and the Web Processing Service (WPS) specifications. The importance of uncertainty in geospatial data is identified and probability theory is examined as a mechanism for quantifying these uncertainties. The Uncertainty Markup Language (UncertML) is presented as a solution to the lack of an interoperable standard for quantifying uncertainty. UncertML is capable of describing uncertainty using statistics, probability distributions or a series of realisations. The capabilities of UncertML are demonstrated through a series of XML examples. This thesis then provides a series of example use cases where UncertML is integrated with existing standards in a variety of applications. The Sensor Observation Service - a service for querying and retrieving sensor-observed data - is extended to provide a standardised method for quantifying the inherent uncertainties in sensor observations. The INTAMAP project demonstrates how UncertML can be used to aid uncertainty propagation using a WPS by allowing UncertML as input and output data. The flexibility of UncertML is demonstrated with an extension to the GML geometry schemas to allow positional uncertainty to be quantified. Further applications and developments of UncertML are discussed
    corecore