5 research outputs found
Load shedding in window-based complex event processing
Complex event processing (CEP) is a powerful paradigm to detect patterns in continuous input event streams. The application area of CEP is very broad, e.g., transportation, stock market, network monitoring, game analytics, retail management, etc. A CEP operator performs pattern matching by correlating input events to detect important situations (called complex events). The criticality of detected complex events depends on the application. For example, in fraud detection systems in banks, detected complex events might indicate that a fraudster tries to withdraw money from a victim’s account. Naturally, the complex events in this application are critical. On the other hand, in applications like network monitoring, soccer analysis, and transportation, the detected complex events might be less critical. As a result, these applications might tolerate imprecise detection or loss of some complex events.
In many applications, the rate of input events is high and exceeds the processing capacity of CEP operators. Moreover, for many applications, it is important to detect complex events within a certain latency bound, where the late detected complex events might become useless. For CEP applications that tolerate imprecise detection of complex events and have limited processing resources, one way to keep the given latency bound is by using load shedding. Load shedding reduces the overload on a CEP operator by either dropping events from the operator’s input event stream or dropping partial matches (short PM) from the operator’s internal state. That results in decreasing the number of queued events and in increasing the operator processing rate, hence enabling the operator to maintain the given latency bound. Of course, dropping might adversely impact the quality of results (QoR). Therefore, it is crucial to shed load in a way that has a low impact on QoR.
There exists only limited work on load shedding in the CEP domain. Therefore, in this thesis, we aim to realize a load shedding library that contains several load shedding approaches for CEP systems. Our shedding approaches drop events and PMs, shed events on different granularity levels, and use several features to predict the importance/utility of events and PMs. More specifically, our contributions are as follows. At first, we precisely define the quality of results (QoR) using real-world examples and different pattern matching semantics defined in the CEP domain. Secondly, we propose a load shedding approach (called pSPICE) that drops PMs to maintain a given latency bound. pSPICE uses the Markov chain and Markov reward process to predict the utility of PMs. Moreover, pSPICE adaptively calculates the number of PMs that must be dropped to maintain the given latency bound.
In our third and fourth contributions, we develop two load shedding approaches that are called eSPICE and hSPICE. eSPICE drops events from windows to maintain the given latency bound. While hSPICE drops events from windows and PMs to maintain the given latency bound. Both approaches use a probabilistic model to predict the event utilities. Moreover, in both approaches, we provide algorithms that predict utility thresholds to drop the needed number of events. Additionally, in eSPICE, we develop an algorithm that adaptively calculates the number of events that must be dropped to maintain the given latency bound.
Finally, we propose a load shedding approach (called gSPICE) that drops events from the input event stream and from windows to maintain the given latency bound. gSPICE also predicts the event utilities using a probabilistic model. Moreover, to efficiently store the event utilities, we develop a data structure that depends on the Zobrist hashing. Furthermore, gSPICE uses well-known machine learning approaches, e.g., decision trees or random forests, to estimate event utilities.
We extensively evaluate our proposed load shedding approaches on several real-world and synthetic datasets using a wide range of CEP queries
gSPICE: Model-Based Event Shedding in Complex Event Processing
Overload situations, in the presence of resource limitations, in complex
event processing (CEP) systems are typically handled using load shedding to
maintain a given latency bound. However, load shedding might negatively impact
the quality of results (QoR). To minimize the shedding impact on QoR, CEP
researchers propose shedding approaches that drop events/internal state with
the lowest importances/utilities. In both black-box and white-box shedding
approaches, different features are used to predict these utilities. In this
work, we propose a novel black-box shedding approach that uses a new set of
features to drop events from the input event stream to maintain a given latency
bound. Our approach uses a probabilistic model to predict these event
utilities. Moreover, our approach uses Zobrist hashing and well-known machine
learning models, e.g., decision trees and random forests, to handle the
predicted event utilities. Through extensive evaluations on several synthetic
and two real-world datasets and a representative set of CEP queries, we show
that, in the majority of cases, our load shedding approach outperforms
state-of-the-art black-box load shedding approaches, w.r.t. QoR