38 research outputs found

    Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins

    Full text link
    Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201

    eChIDNA: Continuous Data Validation in Advanced Metering Infrastructures

    Get PDF
    New laws and regulations increase the demands for a more data-intense metering infrastructure towards more adaptive electricity networks (aka smart grids). The automatic measuring, often involving wireless communication, introduces errors both in software and during data transmission. These demands, as well as the large data volumes that need to be validated, present new challenges to utilities. First, measurement errors cannot be allowed to propagate to the data stored by utilities. Second, manual fixing of errors after storing is not a feasible option with increasing data volumes and decreasing lead times for new services and analysis. Third, validation is not only to be applied to current readings but also to past readings when new types of errors are discovered. This paper addresses these issues by proposing a hybrid system, eChIDNA, utilizing both the store-then-process and the data streaming processing paradigms, enabling for high throughput, low latency distributed and parallel analysis. Validation rules are built upon this paradigm and then implemented on the state of the art Apache Storm Stream Processing Engine to assess performance. Furthermore, patterns of common errors are matched, triggering alerts as a first step towards automatic correction of errors. The system is evaluated with production data from hundreds of thousands of smart meters. The results show a performance in the thousands messages per second realm, showing that stream processing can be used to validate large volumes of meter data online with low processing latency, identifying common errors as they appear. The results from the pattern matching are cross-validated with system experts and show that pattern matching is a viable way to minimize time required from human operators

    On correctness in RDF stream processor benchmarking

    Get PDF
    Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method

    Lazy Approaches for Interval Timing Correlation of Sensor Data Streams

    Get PDF
    We propose novel algorithms for the timing correlation of streaming sensor data. The sensor data are assumed to have interval timestamps so that they can represent temporal uncertainties. The proposed algorithms can support efficient timing correlation for various timing predicates such as deadline, delay, and within. In addition to the classical techniques, lazy evaluation and result cache are utilized to improve the algorithm performance. The proposed algorithms are implemented and compared under various workloads

    Adaptive Disorder Control in Data Stream Processing

    Get PDF
    Out-of-order tuples in continuous data streams may cause inaccurate query results since conventional window operators generally discard those tuples. Existing approaches use a buffer to fix disorder in stream tuples and estimate its size based on the maximum network delay seen in the streams. However, they do not provide a method to control the amount of tuples that are not saved and discarded from the buffer, although users may want to keep it within a predefined error bound according to application requirements. In this paper, we propose a method to estimate the buffer size while keeping the percentage of tuple drops within a user-specified bound. The proposed method utilizes tuples' interarrival times and their network delays for estimation, whose parameters reflect real-time stream characteristics properly. Based on two parameters, our method controls the amount of tuple drops adaptively in accordance with fluctuated stream characteristics and keeps their percentage within a given bound, which we observed through our experiments

    Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients

    Get PDF
    This is the Accepted Manuscript version of the following article: I. Mporas, D. Triantafyllopoulos, V. Megalooikonomou, “Real-Time Management of Multimodal Streaming Data for Monitoring of Epileptic Patients”, Journal of Medical Systems, Vol. 40(45), December 2015. The final published versions is available at: https://link.springer.com/article/10.1007%2Fs10916-015-0403-3 © Springer Science+Business Media New York 2015.New generation of healthcare is represented by wearable health monitoring systems, which provide real-time monitoring of patient’s physiological parameters. It is expected that continuous ambulatory monitoring of vital signals will improve treatment of patients and enable proactive personal health management. In this paper, we present the implementation of a multimodal real-time system for epilepsy management. The proposed methodology is based on a data streaming architecture and efficient management of a big flow of physiological parameters. The performance of this architecture is examined for varying spatial resolution of the recorded data.Peer reviewedFinal Accepted Versio

    Time To Live: Temporal Management of Large-Scale RFID Applications

    Get PDF
    In coming years, there will be billions of RFID tags living in the world tagging almost everything for tracking and identification purposes. This phenomenon will impose a new challenge not only to the network capacity but also to the scalability of event processing of RFID applications. Since most RFID applications are time sensitive, we propose a notion of Time To Live (TTL), representing the period of time that an RFID event can legally live in an RFID data management system, to manage various temporal event patterns. TTL is critical in the "Internet of Things" for handling a tremendous amount of partial event-tracking results. Also, TTL can be used to provide prompt responses to time-critical events so that the RFID data streams can be handled timely. We divide TTL into four categories according to the general event-handling patterns. Moreover, to extract event sequence from an unordered event stream correctly and handle TTL constrained event sequence effectively, we design a new data structure, namely Double Level Sequence Instance List (DLSIList), to record intermediate stages of event sequences. On the basis of this, an RFID data management system, namely Temporal Management System over RFID data streams (TMS-RFID), has been developed. This system can be constructed as a stand-alone middleware component to manage temporal event patterns. We demonstrate the effectiveness of TMS-RFID on extracting complex temporal event patterns through a detailed performance study using a range of high-speed data streams and various queries. The results show that TMS-RFID has a very high throughout, namely 170,000 - 870,000 events per second for different highly complex continuous queries. Moreover, the experiments also show that the main structure to record the intermediate stages in TMS-RFID does not increase exponentially with the number of events. These illustrate that TMS-RFID not only has a high processing speed, but also has a good scalability

    TESLA: a formally defined event specification language

    Get PDF
    The need for timely processing large amounts of information, flowing from the peripheral to the center of a system, is common to different application domains, and it has justified the development of several languages to describe how such information has to be processed. In this paper, we analyze such languages showing how most approaches lack the expressiveness required for the applications we target, or do not provide the precise semantics required to clearly state how the system should behave. Moving from these premises, we present TESLA, a complex event specification language. Each TESLA rule considers incoming data items as notifications of events and defines how certain patterns of events cause the occurrence of others, said to be "complex". TESLA has a simple syntax and a formal semantics, given in terms of a first order, metric temporal logic. It provides high expressiveness and flexibility in a rigorous framework, by offering content and temporal filters, negations, timers, aggregates, and fully customizable policies for event selection and consumption. The paper ends by showing how TESLA rules can be interpreted by a processing system, introducing an efficient event detection algorithm based on automata
    corecore