Search CORE

8 research outputs found

Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

Author: Prasanna Viktor
Simmhan Yogesh
Zhou Qunzhi
Publication venue: 'Elsevier BV'
Publication date: 02/11/2016
Field of study

Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Programming Using Automata and Transducers

Author: D\u27antoni Loris
Publication venue: ScholarlyCommons
Publication date: 01/01/2015
Field of study

Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

CiteSeerX

ScholarlyCommons@Penn

Recommended from our members

HIGH-PERFORMANCE COMPLEX EVENT PROCESSING FOR DECISION ANALYTICS

Author: Zhang Haopeng
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/07/2017
Field of study

Complex Event Processing (CEP) systems are becoming increasingly popular in do- mains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes event streams, and often require such event streams to be processed in real-time. To this end, CEP queries are applied for filtering, correlation, ag- gregation, and transformation, to derive high-level, actionable information. Tasks for CEP systems fall into two categories: passive monitoring and proactive monitoring. For passive monitoring, users know their exact needs and express them in CEP queries, then CEP engines evaluate those queries against incoming data events; for proactive monitoring, users cannot tell exactly what they are looking for and need to work with CEP engines to figure out the query. In my thesis, there are contributions for both categories. For passive monitoring, the first contribution I make is to apply CEP queries over streams with imprecise timestamps, which was infeasible before this work. Existing CEP systems assumed that the occurrence time of each event is known precisely. However I observe that event occurrence times are often unknown or imprecise due to lossy raw data, granularity mismatch or clock synchronization. Therefore, I propose a temporal model that assigns a time interval to each event to represent all of its possible occurrence times. Under the uncertain temporal model, I further propose two evaluation frameworks, a point-based framework which convert events with time intervals into events with point timestamp before pattern matching, and an event-based framework which matches patterns over events with time intervals directly. I also propose optimizations in these frameworks. My new approach achieves high efficiency for a wide range of workloads tested using both both real traces and synthetic datasets. While existing systems cannot process this type of streams, the throughput of my algorithm achieves as high as tens of thousands of events per second for MapReduce case study. This contribution enables CEP techniques applicable for more application scenarios. Another contribution for the passive monitoring is that I identify expensive queries in CEP, analyze their runtime complexity, and propose effective optimizations to improve their performance significantly. Those expensive queries involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. I analyze the runtime complexity of each language component and identify two performance bottlenecks: Kleene closure under the most flexible event selection strategy and confidence computation in the case of imprecise timestamps. For the first bottleneck, I break query evaluation into two parts: pattern matching, which can be shared by many matches and result construction. Optimizations for the shared pattern matching cut cost from exponential to polynomial time and even close-to-linear. To address the second bottleneck, I design a dynamic program- ming algorithm to improve performance. Microbenchmark results show state-of-the-art systems suffer poor performance, while my system can provide 2 to 10 orders of magnitude improvement. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of my proposed techniques: the throughput is over 1 million events per second. The last problem solved in this thesis is about proactive monitoring: explaining anomalies in CEP-based monitoring and proactive monitoring. CEP queries are used widely for monitoring purpose. When users observe abnormal status in the monitoring results, they annotate the abnormal period and a reference period. Then the system generates explanations by analyzing stream events, and the explanations can be encoded into CEP queries for future monitoring on similar anomalies. An entropy-based distance function is designed to select features for explanation. The new distance function reduces up to 99.2% of features to find ground truth compared to state-of-the-art distance functions for time series. A cluster- based auto labeling algorithm is also designed to leverage unlabeled data to filter noisy features. Compared with alternative techniques, the generated results improves up to 800% on explanation quality, reduces 93.8% of features for conciseness, and achieves as high quality as other techniques on prediction quality. The implementation is also efficient: with 2000 concurrent monitoring queries, triggered explanation analysis returns explanations within a minute and affects the performance only slightly, delaying events processing by less than 1 second

ScholarWorks@UMass Amherst

High-performance complex event processing over hierarchical data

Author: Alexander M.
Barzan Mozafari
Carlo Zaniolo
Furche T.
Gauwin O.
Kai Zeng
Kay M.
Koch C.
Loris D'antoni
Olteanu D.
Pitcher C.
Schmidt A.
Snodgrass R. T.
Tang N. V.
Vagena Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen