550 research outputs found

    Online Integration of Semistructured Data

    Get PDF
    Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable

    Effizienter Austausch and Verarbeitung von semistrukturierten Daten in eingebetteten Systemen

    Get PDF
    The Internet is a global system of interconnected computers and computer networks where semi-structured data has been successfully applied for exchanging information. In nowadays Internet the huge range of actors, the large diversity of the associated device classes and domains, and the enormous amount of resource-restricted controllers in this system created new requirements and coined also a new term. Internet of Things (IoT), in this regard, refers to identifiable objects (things) and their virtual representations in an Internet-like structure. The fundamental question the thesis tries to answer is whether and how the same semi-structured data can be also applied to the IoT and the embedded domain in spite of resource-limited controllers. In order to discuss this question properties and requirements of embedded networks with regard to the IoT domain have been collected and evaluated. Thereafter the omnipresent semi-structured data exchange format in the Web, the Extensible Markup Language (XML), has been validated. The result was a list of missing requirements such as a compact representation, a representation that can be generated and consumed fast and also allows a small footprint implementation. To address the compiled requirements a binary representation of XML which nowadays is known as W3Cs Efficient XML Interchange (EXI) format has been accomplished which simultaneously optimizes performance and the utilization of computational resources and is designed to be compatible with XML. Moreover, in this work the format has been practically validated and tested. Addressing the needs of the embedded domain one result of this analyzes were optimizations to constrain runtime memory usage and to predict memory growth at runtime. A concept introduced in this thesis is LazyDOM which reduces memory requirements when processing and querying data. By means of a newly proposed code generation technique processing of EXI on ultra-constrained device classes has been enabled and resulting format modifications have been adopted by the W3C standardization. The research work described in this thesis on efficiently exchanging and processing semi-structured data on constrained embedded devices has not only triggered modifications in the W3C EXI format but even is already adopted in domain specific application standards and implementations. The above mentioned optimizations such as predictably limit the memory growth at runtime have been contributed, discussed and evaluated by the W3C experts and become a core part of the EXI specification. Even more significantly from the IoT perspective these optimizations provide the basis for the adoption of this technology in ISO and IEC standardization which is the first time for automotive and power industry to use IoT in the control plane. The implementation of EXI to conduct the evaluation as part of this thesis has become the de-facto open source reference implementation of EXI and became the basis of a number of other reference implementations such as the OpenV2G project that provides the reference implementation of the communication interface in ISO/IEC 15118. In summary the conducted research work has evaluated the options to adapt semi-structured data for the constrained embedded domain, proposed modifications and evaluated those under realistic conditions. This made it relevant for the technology as well as for application standardization despite the short period of this work. As such the research can now be taken as a basis for further challenges in the IoT field namely adopting concepts of the Semantic Web and adapting those to stimulate the quickly expanding eco-system of embedded devices

    Generating collaborative systems for digital libraries: A model-driven approach

    Get PDF
    This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework

    XML stream transformer generation through program composition and dependency analysis

    Get PDF
    AbstractXML stream transformation, which sequentially processes the input XML data on the fly, makes it possible to process large sized data within a limited amount of memory. Though being efficient in memory-use, stream transformation requires stateful programming, which is error-prone and hard to manage.This paper proposes a scheme for generating XML stream transformers. Given an attribute grammar definition of transformation over an XML tree structure, we systematically derive a stream transformer in two steps. First, an attribute grammar definition of the XML stream transformation is inferred by applying a program composition method. Second, a finite state transition machine is constructed through a dependency analysis. Due to the closure property of the program composition method, our scheme also allows modular construction of XML stream transformers.We have implemented a prototype XML stream transformer generator, called altSAX. The experimental results show that the generated transformers are efficient in memory consumption as well as in execution time

    XML stream transformer generation through program composition and dependency analysis

    Get PDF
    AbstractXML stream transformation, which sequentially processes the input XML data on the fly, makes it possible to process large sized data within a limited amount of memory. Though being efficient in memory-use, stream transformation requires stateful programming, which is error-prone and hard to manage.This paper proposes a scheme for generating XML stream transformers. Given an attribute grammar definition of transformation over an XML tree structure, we systematically derive a stream transformer in two steps. First, an attribute grammar definition of the XML stream transformation is inferred by applying a program composition method. Second, a finite state transition machine is constructed through a dependency analysis. Due to the closure property of the program composition method, our scheme also allows modular construction of XML stream transformers.We have implemented a prototype XML stream transformer generator, called altSAX. The experimental results show that the generated transformers are efficient in memory consumption as well as in execution time

    A context -and template- based data compression approach to improve resource-constrained IoT systems interoperability.

    Get PDF
    170 p.El objetivo del Internet de las Cosas (the Internet of Things, IoT) es el de interconectar todo tipo de cosas, desde dispositivos simples, como una bombilla o un termostato, a elementos más complejos y abstractoscomo una máquina o una casa. Estos dispositivos o elementos varían enormemente entre sí, especialmente en las capacidades que poseen y el tipo de tecnologías que utilizan. Esta heterogeneidad produce una gran complejidad en los procesos integración en lo que a la interoperabilidad se refiere.Un enfoque común para abordar la interoperabilidad a nivel de representación de datos en sistemas IoT es el de estructurar los datos siguiendo un modelo de datos estándar, así como formatos de datos basados en texto (e.g., XML). Sin embargo, el tipo de dispositivos que se utiliza normalmente en sistemas IoT tiene capacidades limitadas, así como recursos de procesamiento y de comunicación escasos. Debido a estas limitaciones no es posible integrar formatos de datos basados en texto de manera sencilla y e1ciente en dispositivos y redes con recursos restringidos. En esta Tesis, presentamos una novedosa solución de compresión de datos para formatos de datos basados en texto, que está especialmente diseñada teniendo en cuenta las limitaciones de dispositivos y redes con recursos restringidos. Denominamos a esta solución Context- and Template-based Compression (CTC). CTC mejora la interoperabilidad a nivel de los datos de los sistemas IoT a la vez que requiere muy pocos recursos en cuanto a ancho de banda de las comunicaciones, tamaño de memoria y potencia de procesamiento

    A context -and template- based data compression approach to improve resource-constrained IoT systems interoperability.

    Get PDF
    170 p.El objetivo del Internet de las Cosas (the Internet of Things, IoT) es el de interconectar todo tipo de cosas, desde dispositivos simples, como una bombilla o un termostato, a elementos más complejos y abstractoscomo una máquina o una casa. Estos dispositivos o elementos varían enormemente entre sí, especialmente en las capacidades que poseen y el tipo de tecnologías que utilizan. Esta heterogeneidad produce una gran complejidad en los procesos integración en lo que a la interoperabilidad se refiere.Un enfoque común para abordar la interoperabilidad a nivel de representación de datos en sistemas IoT es el de estructurar los datos siguiendo un modelo de datos estándar, así como formatos de datos basados en texto (e.g., XML). Sin embargo, el tipo de dispositivos que se utiliza normalmente en sistemas IoT tiene capacidades limitadas, así como recursos de procesamiento y de comunicación escasos. Debido a estas limitaciones no es posible integrar formatos de datos basados en texto de manera sencilla y e1ciente en dispositivos y redes con recursos restringidos. En esta Tesis, presentamos una novedosa solución de compresión de datos para formatos de datos basados en texto, que está especialmente diseñada teniendo en cuenta las limitaciones de dispositivos y redes con recursos restringidos. Denominamos a esta solución Context- and Template-based Compression (CTC). CTC mejora la interoperabilidad a nivel de los datos de los sistemas IoT a la vez que requiere muy pocos recursos en cuanto a ancho de banda de las comunicaciones, tamaño de memoria y potencia de procesamiento
    corecore