4 research outputs found

    Big continuous data: dealing with velocity by composing event streams

    No full text
    International audienceThe rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to on-going events. Modern applications consuming these streams need to extract behaviour patterns that can be obtained by aggregating and mining statically and dynamically huge event histories. An event is the notification that a happening of interest has occurred. Event streams must be combined or aggregated to produce more meaningful information. By combining and aggregating them either from multiple producers, or from a single one during a given period of time, a limited set of events describing meaningful situations may be notified to consumers. Event streams with their volume and continuous production cope mainly with two of the characteristics given to Big Data by the 5V’s model: volume & velocity. Techniques such as complex pattern detection, event correlation, event aggregation, event mining and stream processing, have been used for composing events. Nevertheless, to the best of our knowledge, few approaches integrate different composition techniques (online and post-mortem) for dealing with Big Data velocity. This chapter gives an analytical overview of event stream processing and composition approaches: complex event languages, services and event querying systems on distributed logs. Our analysis underlines the challenges introduced by Big Data velocity and volume and use them as reference for identifying the scope and limitations of results stemming from different disciplines: networks, distributed systems, stream databases, event composition services, and data mining on traces

    ABSTRACT Safety Guarantee of Continuous Join Queries over Punctuated Data Streams

    No full text
    Continuous join queries (CJQ) are needed for correlating data from multiple streams. One fundamental problem for processing such queries is that since the data streams are infinite, this would require the join operator to store infinite states and eventually run out of space. Punctuation semantics has been proposed to specifically address this problem. In particular, punctuations explicitly mark the end of a subset of data and, hence, enable purging of the stored data which will not contribute to any new query results. Given a set of available punctuation schemes, if one can identify that a CJQ still requires unbounded storage, then this query can be flagged as unsafe and can be prevented from running. Unfortunately, while punctuation semantics is clearly useful, the mechanisms to identify if and how a particular CJQ could benefit from a given set of punctuation schemes are not yet known. In this paper, we provide sufficient and necessary conditions for checking whether a CJQ can be safely executed under a given set of punctuation schemes or not. In particular, we introduce a novel punctuation graph to aid the analysis of the safety for a given query. We show that the safety checking problem can be done in polynomial time based on this punctuation graph construct. In addition, various issues and challenges related to the safety checking of CJQs are highlighted. 1

    Window Queries Over Data Streams

    Get PDF
    Evaluating queries over data streams has become an appealing way to support various stream-processing applications. Window queries are commonly used in many stream applications. In a window query, certain query operators, especially blocking operators and stateful operators, appear in their windowed versions. Previous research work in evaluating window queries typically requires ordered streams and this order requirement limits the implementations of window operators and also carries performance penalties. This thesis presents efficient and flexible algorithms for evaluating window queries. We first present a new data model for streams, progressing streams, that separates stream progress from physical-arrival order. Then, we present our window semantic definitions for the most commonly used window operators—window aggregation and window join. Unlike previous research that often requires ordered streams when describing window semantics, our window semantic definitions do not rely on physical-stream arrival properties. Based on the window semantic definitions, we present new implementations of window aggregation and window join, WID and OA-Join. Compared to the existing implementations of stream query operators, our implementations do not require special stream-arrival properties, particularly stream order. In addition, for window aggregation, we present two other implementations extended from WID, Paned-WID and AdaptWID, to improve excution time by sharing sub-aggregates and to improve memory usage for input with data distribution skew, respectively. Leveraging our order-insenstive implementations of window operators, we present a new architecture for stream systems, OOP (Out-of- Order Processing). Instead of relying on ordered streams to indicate stream progress, OOP explicitly communicates stream progress to query operators, and thus is more flexible than the previous in-order processing (IOP) approach, which requires maintaining stream order. We implemented our order-insensitive window query operators and the OOP architecture in NiagaraST and Gigascope. Our performance study in both systems confirms the benefits of our window operator implementations and the OOP architecture compared to the commonly used approaches in terms of memory usage, execution time and latency

    Mobility-awareness in complex event processing systems

    Get PDF
    The proliferation and vast deployment of mobile devices and sensors over the last couple of years enables a huge number of Mobile Situation Awareness (MSA) applications. These applications need to react in near real-time to situations in the environment of mobile objects like vehicles, pedestrians, or cargo. To this end, Complex Event Processing (CEP) is becoming increasingly important as it allows to scalably detect situations “on-the-fly” by continously processing distributed sensor data streams. Furthermore, recent trends in communication networks promise high real-time conformance to CEP systems by processing sensor data streams on distributed computing resources at the edge of the network, where low network latencies can be achieved. Yet, supporting MSA applications with a CEP middleware that utilizes distributed computing resources proves to be challenging due to the dynamics of mobile devices and sensors. In particular, situations need to be efficiently, scalably, and consistently detected with respect to ever-changing sensors in the environment of a mobile object. Moreover, the computing resources that provide low latencies change with the access points of mobile devices and sensors. The goal of this thesis is to provide concepts and algorithms to i) continuously detect situations that recently occurred close to a mobile object, ii) support bandwidth and computational efficient detections of such situations on distributed computing resources, and iii) support consistent, low latency, and high quality detections of such situations. To this end, we introduce the distributed Mobile CEP (MCEP) system which automatically adapts the processing of sensor data streams according to a mobile object’s location. MCEP provides an expressive, location-aware query model for situations that recently occurred at a location close to a mobile object. MCEP significantly reduces latency, bandwidth, and processing overhead by providing on-demand and opportunistic adaptation algorithms to dynamically assign event streams to queries of the MCEP system. Moreover, MCEP incorporates algorithms to adapt the deployment of MCEP queries in a network of computing resources. This way, MCEP supports latency-sensitive, large-scale deployments of MSA applications and ensures a low network utilization while mobile objects change their access points to the system. MCEP also provides methods to increase the scalability in terms of deployed MCEP queries by reusing event streams and computations for detecting common situations for several mobile objects
    corecore