11 research outputs found

    Efficiently correlating complex events over live and archived data streams

    Get PDF
    Correlating complex events over live and archived data streams, which we call Pattern Correlation Queries (PCQs), provides many benefits for domains which need real-time forecasting of events or identification of causal dependencies, while handling data at high rates and in massive amounts, like in financial or medical settings. Existing work has focused either on complex event processing over a single type of stream source (i.e., either live or archived), or on simple stream correlation queries (e.g., live events trigerring a database lookup). In this paper, we specifically focus on recency-based PCQs and provide clear, useful, and optimizable semantics for them. PCQs raise a number of challenges in optimizing data management and query processing, which we address in the setting of the DejaVu complex event processing system. More specifically, we propose three complementary optimizations including recent in-put buffering, query result caching, and join source ordering. Fur-thermore, we capture the relevant query processing tradeoffs in a cost model. An extensive performance study on synthetic and real-life data sets not only validates this cost model, but also shows that our optimizations are very effective, achieving more than two orders magnitude throughput improvement and much better scala-bility compared to a conventional approach

    Modeling the execution semantics of stream processing engines with SECRET

    Get PDF
    There are many academic and commercial stream processing engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. In this paper, we present SECRET, a model of the behavior of SPEs. SECRET is a descriptive model that allows users to analyze the behavior of systems and understand the results of window-based queries (with time- and tuple-based windows) for a broad range of heterogeneous SPEs. The model is the result of extensive analysis and experimentation with several commercial and academic engines. In the paper, we describe the types of heterogeneity found in existing engines and show with experiments on real systems that our model can explain the key differences in windowing behavio

    Pattern matching over sequences of rows in a relational database system

    No full text
    In Complex Event Processing (CEP) applications such as supply chain management and financial data analysis, the capability to match patterns over data sequences is increasingly becoming an important need. This not only involves finding event patterns of interest on live data streams but also requires a similar functionality over archived sequences of streams for historical analysis, verification, and correlation. The goal of this thesis is to extend a relational database system with the capability to match patterns over contiguous sequences of rows stored in a database table. More specif-ically, we have implemented a major subset of the 2007 ANSI standard proposal on adding MATCH RECOGNIZE clause to standard SQL on top of the MySQL open-source database engine. We have done this in a way to leverage the existing MySQL architecture and process-ing model as much as possible, while at the same time carefully identifying the parts where brand new extensions were necessary. Thus, one of the main contributions of this thesis is that it clearly shows what it takes in general to add pattern matching capability to any relationa

    Contents

    No full text
    2.1 Visualization layer- event processing layer............. 3 2.2 Event processing layer- data acquisition layer...........

    Design and Implementation of the MaxStream Federated Stream Processing Architecture

    No full text
    Despite the availability of several commercial data stream processing engines (SPEs), it remains hard to develop and maintain streaming applications. A major difficulty is the lack of standards, and the wide (and changing) variety of application requirements. Consequently, existing SPEs vary widely in data and query models, APIs, functionality, and optimization capabilities. This has led to some organizations using multiple SPEs, based on their application needs. Furthermore, management of stored data and streaming data are still mostly separate concerns, although applications increasingly require integrated access to both. In the MaxStream project, our goal is to design and build a federated stream processing architecture that seamlessly integrates multiple autonomous and heterogeneous SPEs with traditional databases, and hence facilitates the incorporation of new functionality and requirements. In this paper, we describe the design and implementation of the MaxStream architecture, and demonstrate its feasibility and performance on two benchmarks: the Linear Road Stream Data Management Benchmark and the SAP Sales and Distribution Benchmark
    corecore