Load shedding in window-based complex event processing

Abstract

Complex event processing (CEP) is a powerful paradigm to detect patterns in continuous input event streams. The application area of CEP is very broad, e.g., transportation, stock market, network monitoring, game analytics, retail management, etc. A CEP operator performs pattern matching by correlating input events to detect important situations (called complex events). The criticality of detected complex events depends on the application. For example, in fraud detection systems in banks, detected complex events might indicate that a fraudster tries to withdraw money from a victim’s account. Naturally, the complex events in this application are critical. On the other hand, in applications like network monitoring, soccer analysis, and transportation, the detected complex events might be less critical. As a result, these applications might tolerate imprecise detection or loss of some complex events. In many applications, the rate of input events is high and exceeds the processing capacity of CEP operators. Moreover, for many applications, it is important to detect complex events within a certain latency bound, where the late detected complex events might become useless. For CEP applications that tolerate imprecise detection of complex events and have limited processing resources, one way to keep the given latency bound is by using load shedding. Load shedding reduces the overload on a CEP operator by either dropping events from the operator’s input event stream or dropping partial matches (short PM) from the operator’s internal state. That results in decreasing the number of queued events and in increasing the operator processing rate, hence enabling the operator to maintain the given latency bound. Of course, dropping might adversely impact the quality of results (QoR). Therefore, it is crucial to shed load in a way that has a low impact on QoR. There exists only limited work on load shedding in the CEP domain. Therefore, in this thesis, we aim to realize a load shedding library that contains several load shedding approaches for CEP systems. Our shedding approaches drop events and PMs, shed events on different granularity levels, and use several features to predict the importance/utility of events and PMs. More specifically, our contributions are as follows. At first, we precisely define the quality of results (QoR) using real-world examples and different pattern matching semantics defined in the CEP domain. Secondly, we propose a load shedding approach (called pSPICE) that drops PMs to maintain a given latency bound. pSPICE uses the Markov chain and Markov reward process to predict the utility of PMs. Moreover, pSPICE adaptively calculates the number of PMs that must be dropped to maintain the given latency bound. In our third and fourth contributions, we develop two load shedding approaches that are called eSPICE and hSPICE. eSPICE drops events from windows to maintain the given latency bound. While hSPICE drops events from windows and PMs to maintain the given latency bound. Both approaches use a probabilistic model to predict the event utilities. Moreover, in both approaches, we provide algorithms that predict utility thresholds to drop the needed number of events. Additionally, in eSPICE, we develop an algorithm that adaptively calculates the number of events that must be dropped to maintain the given latency bound. Finally, we propose a load shedding approach (called gSPICE) that drops events from the input event stream and from windows to maintain the given latency bound. gSPICE also predicts the event utilities using a probabilistic model. Moreover, to efficiently store the event utilities, we develop a data structure that depends on the Zobrist hashing. Furthermore, gSPICE uses well-known machine learning approaches, e.g., decision trees or random forests, to estimate event utilities. We extensively evaluate our proposed load shedding approaches on several real-world and synthetic datasets using a wide range of CEP queries

    Similar works