11,093 research outputs found

    Incremental Processing and Optimization of Update Streams

    Get PDF
    Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications. Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks

    Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things

    Get PDF
    In a typical Internet of Things (IoT) deployment such as smart cities and Industry 4.0, the amount of sensory data collected from physical world is significant and wide-ranging. Processing large amount of real-time data from the diverse IoT devices is challenging. For example, in IoT environment, wireless sensor networks (WSN) are typically used for the monitoring and collecting of data in some geographic area. Spatial range queries with location constraints to facilitate data indexing are traditionally employed in such applications, which allows the querying and managing the data based on SQL structure. One particular challenge is to minimize communication cost and storage requirements in multi-dimensional data indexing approaches. In this paper, we present an energy- and time-efficient multidimensional data indexing scheme, which is designed to answer range query. Specifically, we propose data indexing methods which utilize hierarchical indexing structures, using binary space partitioning (BSP), such as kd-tree, quad-tree, k-means clustering, and Voronoi-based methods to provide more efficient routing with less latency. Simulation results demonstrate that the Voronoi Diagram-based algorithm minimizes the average energy consumption and query response time

    Efficient Processing of k Nearest Neighbor Joins using MapReduce

    Full text link
    k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201
    corecore