Search CORE

318 research outputs found

ZStream: A cost-based query processor for adaptively detecting composite events

Author: Madden Samuel R.
Mei Yuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Composite (or Complex) event processing (CEP) systems search sequences of incoming events for occurrences of user-specified event patterns. Recently, they have gained more attention in a variety of areas due to their powerful and expressive query language and performance potential. Sequentiality (temporal ordering) is the primary way in which CEP systems relate events to each other. In this paper, we present a CEP system called ZStream to efficiently process such sequential patterns. Besides simple sequential patterns, ZStream is also able to detect other patterns, including conjunction, disjunction, negation and Kleene closure. Unlike most recently proposed CEP systems, which use non-deterministic finite automata (NFA's) to detect patterns, ZStream uses tree-based query plans for both the logical and physical representation of query patterns. By carefully designing the underlying infrastructure and algorithms, ZStream is able to unify the evaluation of sequence, conjunction, disjunction, negation, and Kleene closure as variants of the join operator. Under this framework, a single pattern in ZStream may have several equivalent physical tree plans, with different evaluation costs. We propose a cost model to estimate the computation costs of a plan. We show that our cost model can accurately capture the actual runtime behavior of a plan, and that choosing the optimal plan can result in a factor of four or more speedup versus an NFA based approach. Based on this cost model and using a simple set of statistics about operator selectivity and data rates, ZStream is able to adaptively and seamlessly adjust the order in which it detects patterns on the fly. Finally, we describe a dynamic programming algorithm used in our cost model to efficiently search for an optimal query plan for a given pattern.National Natural Science Foundation (Grant number NETS-NOSS 0520032

DSpace@MIT

A Comprehensive Scalable Framework for Cloud-Native Pattern Detection with Enhanced Expressiveness

Author: Gounaris Anastasios
Mavroudopoulos Ioannis
Publication venue
Publication date: 18/01/2024
Field of study

Detecting complex patterns in large volumes of event logs has diverse applications in various domains, such as business processes and fraud detection. Existing systems like ELK are commonly used to tackle this challenge, but their performance deteriorates for large patterns, while they suffer from limitations in terms of expressiveness and explanatory capabilities for their responses. In this work, we propose a solution that integrates a Complex Event Processing (CEP) engine into a broader query processsor on top of a decoupled storage infrastructure containing inverted indices of log events. The results demonstrate that our system excels in scalability and robustness, particularly in handling complex queries. Notably, our proposed system delivers responses for large complex patterns within seconds, while ELK experiences timeouts after 10 minutes. It also significantly outperforms solutions relying on FlinkCEP and executing MATCH_RECOGNIZE SQL queries

arXiv.org e-Print Archive

Enhancing performance and expressibility of complex event processing using binary tree-based directed graph

Author: Behravesh Babak
Publication venue
Publication date: 01/02/2016
Field of study

In various domains, applications are required to detect and react to complex situations accordingly. In response to the demand for matching receiving events to complex patterns, several event processing systems have been developed. However, there are just a few of them considered both performance and expressibility of event matching as focusing only on performance can cause negative effect on the expressibility or vice versa. This research develops a fast adaptive event matching system (FAEM), a new event matching system to improve expressibility and performance measures (throughput and end-to-end latency). This system is designed and developed based on a novel binary tree-based directed graph (BTDG) as a unified basis for event-matching. The proposed system transforms a user-defined query into a set of system objects including buffers, conditions on buffers, cursors, and join operators (non-kleene and kleene operators) and arranges these objects on a BTDG. Provided BTDG the enhancement in performance of non-kleene operators applied through developing a batch removal method to remove the events that are located out of time-window, and an actual time window (ATW) which can improve performance of event matching. To improve performance of kleene operators, this research introduces a twin algorithms for kleene operator which is match to BTDG. These two kleene algorithms apply grouping on events and reduce the number of intermediate results and apply combination algorithm in final stage. Transformation of queries containing join operators into BTDG enhances the expressibility of the proposed CEP system

Universiti Teknologi Malaysia Institutional Repository

ViewDF: a Flexible Framework for Incremental View Maintenance in Stream Data Warehouses

Author: Yang Yuke
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Because of the increasing data sizes and demands for low latency in modern data analysis, the traditional data warehousing technologies are greatly pushed beyond their limits. Several stream data warehouse (SDW) systems, which are warehouses that ingest append-only data feeds and support frequent refresh cycles, have been proposed including different methods to improve the responsiveness of the systems. Materialized views are critical in large-scale data warehouses due to their ability to speed up queries. Thus an SDW maintains layers of materialized views. Materialized view maintenance in SDW systems introduces new challenges. However, some of the existing SDW systems do not address the maintenance of views while others employ view maintenance techniques that are not efficient. This thesis presents ViewDF, a flexible framework for incremental maintenance of materialized views in SDW systems that generalizes existing techniques and enables new optimizations for views defined with operators that are common in stream analytics. We give a special view definition (ViewDF) to enhance the traditional way of creating views in SQL by being able to reference any partition of any table. We describe a prototype system based on this idea, which allows users to write ViewDFs directly and can automatically translate a broad class of queries into ViewDFs. Several optimizations are proposed and experiments show that our proposed system can improve view maintenance time by a factor of two or more in practical settings.1 yea

University of Waterloo's Institutional Repository

Recommended from our members

HIGH-PERFORMANCE COMPLEX EVENT PROCESSING FOR DECISION ANALYTICS

Author: Zhang Haopeng
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/07/2017
Field of study

Complex Event Processing (CEP) systems are becoming increasingly popular in do- mains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes event streams, and often require such event streams to be processed in real-time. To this end, CEP queries are applied for filtering, correlation, ag- gregation, and transformation, to derive high-level, actionable information. Tasks for CEP systems fall into two categories: passive monitoring and proactive monitoring. For passive monitoring, users know their exact needs and express them in CEP queries, then CEP engines evaluate those queries against incoming data events; for proactive monitoring, users cannot tell exactly what they are looking for and need to work with CEP engines to figure out the query. In my thesis, there are contributions for both categories. For passive monitoring, the first contribution I make is to apply CEP queries over streams with imprecise timestamps, which was infeasible before this work. Existing CEP systems assumed that the occurrence time of each event is known precisely. However I observe that event occurrence times are often unknown or imprecise due to lossy raw data, granularity mismatch or clock synchronization. Therefore, I propose a temporal model that assigns a time interval to each event to represent all of its possible occurrence times. Under the uncertain temporal model, I further propose two evaluation frameworks, a point-based framework which convert events with time intervals into events with point timestamp before pattern matching, and an event-based framework which matches patterns over events with time intervals directly. I also propose optimizations in these frameworks. My new approach achieves high efficiency for a wide range of workloads tested using both both real traces and synthetic datasets. While existing systems cannot process this type of streams, the throughput of my algorithm achieves as high as tens of thousands of events per second for MapReduce case study. This contribution enables CEP techniques applicable for more application scenarios. Another contribution for the passive monitoring is that I identify expensive queries in CEP, analyze their runtime complexity, and propose effective optimizations to improve their performance significantly. Those expensive queries involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. I analyze the runtime complexity of each language component and identify two performance bottlenecks: Kleene closure under the most flexible event selection strategy and confidence computation in the case of imprecise timestamps. For the first bottleneck, I break query evaluation into two parts: pattern matching, which can be shared by many matches and result construction. Optimizations for the shared pattern matching cut cost from exponential to polynomial time and even close-to-linear. To address the second bottleneck, I design a dynamic program- ming algorithm to improve performance. Microbenchmark results show state-of-the-art systems suffer poor performance, while my system can provide 2 to 10 orders of magnitude improvement. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of my proposed techniques: the throughput is over 1 million events per second. The last problem solved in this thesis is about proactive monitoring: explaining anomalies in CEP-based monitoring and proactive monitoring. CEP queries are used widely for monitoring purpose. When users observe abnormal status in the monitoring results, they annotate the abnormal period and a reference period. Then the system generates explanations by analyzing stream events, and the explanations can be encoded into CEP queries for future monitoring on similar anomalies. An entropy-based distance function is designed to select features for explanation. The new distance function reduces up to 99.2% of features to find ground truth compared to state-of-the-art distance functions for time series. A cluster- based auto labeling algorithm is also designed to leverage unlabeled data to filter noisy features. Compared with alternative techniques, the generated results improves up to 800% on explanation quality, reduces 93.8% of features for conciseness, and achieves as high quality as other techniques on prediction quality. The implementation is also efficient: with 2000 concurrent monitoring queries, triggered explanation analysis returns explanations within a minute and affects the performance only slightly, delaying events processing by less than 1 second

ScholarWorks@UMass Amherst

Analyzing audit trails in a distributed and hybrid intrusion detection platform

Author: Alves Pedro Miguel de Freitas
Publication venue
Publication date: 01/03/2016
Field of study

Efforts have been made over the last decades in order to design and perfect Intrusion Detection Systems (IDS). In addition to the widespread use of Intrusion Prevention Systems (IPS) as perimeter defense devices in systems and networks, various IDS solutions are used together as elements of holistic approaches to cyber security incident detection and prevention, including Network-Intrusion Detection Systems (NIDS) and Host-Intrusion Detection Systems (HIDS). Nevertheless, specific IDS and IPS technology face several effectiveness challenges to respond to the increasing scale and complexity of information systems and sophistication of attacks. The use of isolated IDS components, focused on one-dimensional approaches, strongly limits a common analysis based on evidence correlation. Today, most organizations’ cyber-security operations centers still rely on conventional SIEM (Security Information and Event Management) technology. However, SIEM platforms also have significant drawbacks in dealing with heterogeneous and specialized security event-sources, lacking the support for flexible and uniform multi-level analysis of security audit-trails involving distributed and heterogeneous systems. In this thesis, we propose an auditing solution that leverages on different intrusion detection components and synergistically combines them in a Distributed and Hybrid IDS (DHIDS) platform, taking advantage of their benefits while overcoming the effectiveness drawbacks of each one. In this approach, security events are detected by multiple probes forming a pervasive, heterogeneous and distributed monitoring environment spread over the network, integrating NIDS, HIDS and specialized Honeypot probing systems. Events from those heterogeneous sources are converted to a canonical representation format, and then conveyed through a Publish-Subscribe middleware to a dedicated logging and auditing system, built on top of an elastic and scalable document-oriented storage system. The aggregated events can then be queried and matched against suspicious attack signature patterns, by means of a proposed declarative query-language that provides event-correlation semantics

Repositório da Universidade Nova de Lisboa

Modélisation formelle des systèmes de détection d'intrusions

Author: Nganyewou Tidjon Lionel
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2020
Field of study

L’écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de systèmes de détection d’intrusions : détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d’un attaquant. Cela demande une bonne connaissance du comportement de l’attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l’avantage d’être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l’expression de règles de reconnaissance d’attaques. Le nombre d’attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l’expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d’événements. Dans cette thèse, nous proposons une approche stateful basée sur les diagrammes d’état-transition algébriques (ASTDs) afin d’identifier des attaques complexes. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d’événements à l’aide d’un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l’interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d’une spécification ASTD, capable d’identifier de façon efficiente les séquences d’événements.Abstract : The cybersecurity ecosystem continuously evolves with the number, the diversity, and the complexity of cyber attacks. Generally, we have three types of Intrusion Detection System (IDS) : anomaly-based detection, signature-based detection, and hybrid detection. Anomaly detection is based on the usual behavior description of the system, typically in a static manner. It enables detecting known or unknown attacks but also generating a large number of false positives. Signature based detection enables detecting known attacks by defining rules that describe known attacker’s behavior. It needs a good knowledge of attacker behavior. Hybrid detection relies on several detection methods including the previous ones. It has the advantage of being more precise during detection. Tools like Snort and Zeek offer low level languages to represent rules for detecting attacks. The number of potential attacks being large, these rule bases become quickly hard to manage and maintain. Moreover, the representation of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach based on algebraic state-transition diagrams (ASTDs) to identify complex attacks. ASTDs allow a graphical and modular representation of a specification, that facilitates maintenance and understanding of rules. We extend the ASTD notation with new features to represent complex attacks. Next, we specify several attacks with the extended notation and run the resulting specifications on event streams using an interpreter to identify attacks. We also evaluate the performance of the interpreter with industrial tools such as Snort and Zeek. Then, we build a compiler in order to generate executable code from an ASTD specification, able to efficiently identify sequences of events

Thèses en Ligne

Savoirs UdeS