4,610 research outputs found

    GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

    Full text link
    We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

    Data semantic enrichment for complex event processing over IoT Data Streams

    Get PDF
    This thesis generalizes techniques for processing IoT data streams, semantically enrich data with contextual information, as well as complex event processing in IoT applications. A case study for ECG anomaly detection and signal classification was conducted to validate the knowledge foundation

    Big Data Management Challenges, Approaches, Tools and their limitations

    No full text
    International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

    Metrics and Algorithms for Processing Multiple Continuous Queries

    Get PDF
    Data streams processing is an emerging research area that is driven by the growing need for monitoring applications. A monitoring application continuously processes streams of data for interesting, significant, or anomalous events. Such applications include tracking the stock market, real-time detection of diseaseoutbreaks, and environmental monitoring via sensor networks.Efficient employment of those monitoring applications requires advanced data processing techniques that can support the continuous processing of unbounded rapid data streams. Such techniques go beyond the capabilities of the traditional store-then-query Data BaseManagement Systems. This need has led to a new data processing paradigm and created a new generation of data processing systems,supporting continuous queries (CQ) on data streams.Primary emphasis in the development of first generation Data Stream Management Systems (DSMSs) was given to basic functionality. However, in order to support large-scale heterogeneous applications that are envisioned for subsequent generations of DSMSs, greater attention willhave to be paid to performance issues. Towards this, this thesis introduces new algorithms and metrics to the current design of DSMSs.This thesis identifies a collection of quality ofservice (QoS) and quality of data (QoD) metrics that are suitable for a wide range of monitoring applications. The establishment of well-defined metrics aids in the development of novel algorithms that are optimal with respect to a particular metric. Our proposed algorithms exploit the valuable chances for optimization that arise in the presence of multiple applications. Additionally, they aim to balance the trade-off between the DSMS's overall performance and the performance perceived by individual applications. Furthermore, we provide efficient implementations of the proposed algorithms and we also extend them to exploit sharing in optimized multi-query plans and multi-stream CQs. Finally, we experimentally show that our algorithms consistently outperform the current state of the art

    In support of workload-aware streaming state management

    Full text link
    Modern distributed stream processors predominantly rely on LSM-based key-value stores to manage the state of long-running computations. We question the suitability of such general-purpose stores for streaming workloads and argue that they incur unnecessary overheads in exchange for state management capabilities. Since streaming operators are instantiated once and are long-running, state types, sizes, and access patterns, can either be inferred at compile time or learned during execution. This paper surfaces the limitations of established practices for streaming state management and advocates for configurable streaming backends, tailored to the state requirements of each operator. Using workload-aware state management, we achieve an order of magnitude improvement in p99 latency and 2x higher throughput.https://www.usenix.org/system/files/hotstorage20_paper_kalavri.pdfPublished versio
    • …
    corecore