22 research outputs found

    GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

    Full text link
    We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

    SCHEDULING OF UPDATES IN DATA WAREHOUSES

    Get PDF
    ABSTRACT A stream warehouse enables queries that seamlessly range from realtime alerting and diagnostics to long-term data mining. Continuously loading data from many different and uncontrolled sources into a real-time stream warehouse introduces a new consistency problem: users want results in as timely a fashion as possible, but "stable" results often require lengthy synchronization delays. In this paper we develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time

    Semantics and evaluation techniques for window aggregates in data streams

    Full text link

    MESHJOIN*:An Algorithm Supporting Streaming Updates in a Real-time Data Warehouse

    Get PDF
    提出了一种新的实时数据仓库环境下的数据流更新算法——MESHJOIn*算法。算法的特性有:(1)关系r采用了分块和散列的组织形式,尽可能避免对当前连接无效元组的读取,减少连接操作所涉及元组的数量,从而提高连接算法的效率;(2)采用了多线程并发连接技术,并根据工程学原理,实现了连接操作和关系r读取操作的最佳调度,保证了连接算法效率的最大化;(3)根据当前系统的服务率和数据流元组的到达率之间的关系,合理调度实时元组和准实时元组的执行,保证了系统对实时元组的处理要求。实验结果表明,MESHJOIn*算法可以取得比MESHJOIn算法更好的性能。A new algorithm called MESHJOIN* is proposed to support streaming updates under real-time data warehouse environment.It has the following distinct features:(1) Relation R is organized in blocks and hashes so as to avoid the reading of unusable tuples for the current join operation as much as possible,through which the amount of tuples involved in a join is much reduced,thus enhancing the efficiency of the join operation;(2) Multi-thread parallel execution technology is adopted here,and the order of read operation and join operation is optimized according to engineering theory so as to maximize the efficiency of join algorithm;(3) Reasonable scheduling of real-time tuples and near-real-time tuples is achieved according to the relationship between the current system service rate and the tuples arriving rate,so that the requirement for the processing of real-time tuples is satisfied.Experimental results show that MESHJOIN* can achieve much better performance than MESHJOIN.国家自然科学基金No.50604012---

    MESHJOIN*: An Algorithm Supporting Streaming Updates in a Real-time Data Warehouse

    Get PDF
    的方法和基于模式图的方法,并详细介绍了各种方法的原理以及各自的优缺点;最后展望了未来的研究方向.The National Natural Science Foundation of China under Grant No.50604012 (国家自然科学基金

    On-the-fly sharing for streamed aggregation

    Full text link

    Scheduling for Shared Window Joins Over Data Streams

    Get PDF
    Continuous Ouery (CO) systems typically exploit commonality among query expressions to achieve improved efficiency through shared processing. Recently proposed CO systems have introduced window specifications in order to support unbounded data streams. There has been, however, little investigation of sharing for windowed query operators
    corecore