524 research outputs found

    Online Integration of Semistructured Data

    Get PDF
    Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable

    A Method of XML Document Fragmentation for Reducing Time of XML Fragment Stream Query Processing

    Get PDF
    As XML has been established as the standard for data exchange not just on the Web but among heterogeneous devices, systems, and applications, effective processing of XML queries is one of core components of ubiquitous computing. Most of the mobile/hand-held devices deployed in ubiquitous computing environment are still limited in memory and processing power. An effective query processing is required when the source XML document is of large volume. The framework of fragmenting an XML document and streaming the XML fragments for query processing at the mobile devices has received much attention. However, the main focus was on the memory efficiency to cope with the memory constraint in the mobile devices. Query processing time might be compromised in those techniques. Since the processing power is also limited in the mobile devices, the time optimization deserves attention. We have found out that the query processing time is significantly affected by how the source XML document is fragmented. In this paper, we propose a method of XML document fragmentation whereby query processing gets efficient in time while the size constraint for each resulting fragment is satisfied. Through implementation and a set of detailed experiments, we show that our proposed method considerably outperforms other methods

    Memory-Efficient Query Processing over XML Fragment Stream with Fragment Labeling

    Get PDF
    The portable/hand-held devices deployed in mobile computing environment are mostly limited in memory. To make it possible for them to locally process queries over a large volume of XML data, the data needs to be streamed in fragments of manageable size and the queries need to be processed over the stream with as little memory as possible. In this paper, we report a considerable improvement of the state-of-the-art techniques of query processing over XML fragment stream in memory efficiency. We use XML fragment labeling (XFL) as a method of representing XML fragmentation, and show that XFL is much more effective than the popular hole-filler (HF) model employed in the state-of-the-art in reducing the amount of memory required for query processing. The state-of-the-art with the HF model requires more memory as the stream size increases. With XFL, we overcome this fundamental limitation, proposing the techniques to make query processing scalable in the sense that memory requirement is not affected by the size of the stream as long as the stream is bounded. The improvement is verified through implementation and a detailed set of experiments

    Pathfinder: relational XQuery over multi-gigabyte XML inputs in interactive time

    Get PDF
    Using a relational DBMS as back-end engine for an XQuery processing system leverages relational query optimization and scalable query processing strategies provided by mature DBMS engines in the XML domain. Though a lot of theoretical work has been done in this area and various solutions have been proposed, no complete systems have been made available so far to give the practical evidence that this is a viable approach. In this paper, we describe the ourely relational XQuery processor Pathfinder that has been built on top of the extensible RDBMS MonetDB. Performance results indicate that the system is capable of evaluating XQuery queries efficiently, even if the input XML documents become huge. We additionally present further contributions such as loop-lifted staircase join, techniques to derive order properties and to reduce sorting effort in the generated relational algebra plans, as well as methods for optimizing XQuery joins, which, taken together, enabled us to reach our performance and scalability goal

    Efficient Evaluation of Multiple Queries on Streamed XML Fragments

    Full text link

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    A Functional Model for Data Analysis and Result Visualization

    Get PDF
    In several Web based applications (e-commerce, e- learning, digital libraries, etc.) one needs to display a dense array of information in a small amount of space (such as a screen) in a manner that communicates clearly and immediately. The information displayed is usually aggregates of results obtained through analysis of large amounts of data. We present a functional model that supports the data analysis and aggregation process, and a prototype that supports casual users in doing the following: (a) construct an analytic query visually, in an interactive manner, (b) visualize the aggregate result in a user selected mode (histogram, pie, etc.), (c) explore the query result by providing equivalent representations at different aggregation levels or for different parameter values selected by the user

    Towards Next Generation Business Process Model Repositories – A Technical Perspective on Loading and Processing of Process Models

    Get PDF
    Business process management repositories manage large collections of process models ranging in the thousands. Additionally, they provide management functions like e.g. mining, querying, merging and variants management for process models. However, most current business process management repositories are built on top of relation database management systems (RDBMS) although this leads to performance issues. These issues result from the relational algebra, the mismatch between relational tables and object oriented programming (impedance mismatch) as well as new technological developments in the last 30 years as e.g. more and cheap disk and memory space, clusters and clouds. The goal of this paper is to present current paradigms to overcome the performance problems inherent in RDBMS. Therefore, we have to fuse research about data modeling along database technologies as well as algorithm design and parallelization for the technology paradigms occurring nowadays. Based on these research streams we have shown how the performance of business process management repositories could be improved in terms of loading performance of processes (from e.g. a disk) and the computation of management techniques resulting in even faster application of such a technique. Exemplarily, applications of the compiled paradigms are presented to show their applicability
    • …
    corecore