    The DEBS 2020 grand challenge

    The ACM DEBS 2020 Grand Challenge is the tenth in a series of challenges which seek to provide a common ground and evaluation criteria for a competition aimed at both research and industrial event-based systems. The focus of the ACM DEBS 2020 Grand Challenge is on Non-Intrusive Load Monitoring (NILM). The goal of the challenge is to detect when appliances contributing to an aggregated stream of voltage and current readings from a smart meter are switched on or off. NILM is leveraged in many contexts, ranging from monitoring of energy consumption to home automation. This paper describes the specifics of the data streams provided in the challenge, as well as the benchmarking platform that supports the testing of the solutions submitted by the participants

    Distributed complex event recognition

    Complex Event Recognition (CER) has emerged as a prominent technology for detecting situations of interest, in the form of query patterns, over large streams of data in real-time. Thus, having query evaluation mechanisms that minimize latency is a shared desiderata. Nonetheless, the evaluation of CER queries is well known to be computationally expensive. Indeed, such evaluation requires maintaining a set of partial matches which grows super-linearly in the number of processed events. While most prominent solutions for CER run in a centralized setting, this has proved inefficient for Big Data requirements, where it is necessary to scale the system to cope with an increasing arrival rate of events while maintaining a stable throughput. To overcome these issues, we propose a novel distributed CER system that focuses on the efficient evaluation of a large class of complex event queries, including n-ary predicates, time windows, and partition-by event correlation operator. This system uses a state-of-the-art automaton-based distributed algorithm that circumvents the super-linear partial match problem. Moreover, in the presence of heavy workloads, the system can scale-out by increasing the number of processing units with little overhead. We additionally provide a proof of correctness of the algorithm. We experimentally compare our system against the state-of-the-art sequential CER engine that inspired our work and show that our system outperform its predecessor in the presence of queries with complex predicates. Furthermore, we show that, in the presence of Big Data requirements, our system performance is overall better

    A study on the Probabilistic Interval-based Event Calculus

    Η Αναγνώριση Σύνθετων Γεγονότων είναι το πεδίο εκείνο της Τεχνητής Νοημοσύνης το οποίο αποσκοπεί στο σχεδιασμό και την κατασκευή συστημάτων τα οποία επεξεργάζονται γρήγορα μεγάλες και πιθανώς ετερογενείς ροές δεδομένων και τα οποία είναι σε θέση να αναγνωρίζουν εγκαίρως μη τετριμμένα και ενδιαφέροντα συμβάντα, βάσει κατάλληλων ορισμών που προέρχονται από ειδικούς. Σκοπός ενός τέτοιου συστήματος είναι η αυτοματοποιημένη εποπτεία πολύπλοκων και απαιτητικών καταστάσεων και η υποβοήθηση της λήψης αποφάσεων από τον άνθρωπο. Η αβεβαιότητα και ο θόρυβος είναι έννοιες που υπεισέρχονται φυσικά σε τέτοιες ροές δεδομένων και συνεπώς, καθίσταται απαραίτητη η χρήση της Θεωρίας Πιθανοτήτων για την αντιμετώπισή τους. Η πιθανοτική Αναγνώριση Σύνθετων Γεγονότων μπορεί να πραγματοποιηθεί σε επίπεδο χρονικής στιγμής ή σε επίπεδο χρονικού διαστήματος. Η παρούσα εργασία εστιάζει στον PIEC, έναν σύγχρονο αλγόριθμο για την Αναγνώριση Σύνθετων Γεγονότων με τη χρήση πιθανοτικών, μέγιστων διαστημάτων. Αρχικά παρουσιάζουμε τον αλγόριθμο και τον ερευνούμε ενδελεχώς. Μελετούμε την ορθότητά του μέσα από μια σειρά μαθηματικών αποδείξεων περί της ευρωστίας (soundness) και της πληρότητάς του (completeness). Κατόπιν, παραθέτουμε εκτενή πειραματική αποτίμηση του υπό μελέτη αλγορίθμου και σύγκρισή του με συστήματα πιθανοτικής Αναγνώρισης Γεγονότων σε επίπεδο χρονικών σημείων. Τα αποτελέσματά μας δείχνουν ότι ο PIEC επιδεικνύει σταθερά καλύτερη Ανάκληση (Recall), παρουσιάζοντας, ωστόσο κάποιες απώλειες σε Ακρίβεια (Precision) σε ορισμένες περιπτώσεις. Για τον λόγο αυτόν, εμβαθύνουμε και εξετάζουμε συγκεκριμένες περιπτώσεις στις οποίες ο PIEC αποδίδει καλύτερα, καθώς και άλλες στις οποίες παράγει αποτελέσματα υποδεέστερα των παραδοσιακών μεθόδων σημειακής αναγνώρισης, σε μια προσπάθεια να εντοπίσουμε και να διατυπώσουμε τις δυνατότητες αλλά και τις αδυναμίες του αλγορίθμου. Τέλος, θέτουμε τις γενικές κατευθυντήριες γραμμές για περαιτέρω έρευνα στο εν λόγω ζήτημα, τμήματα της οποίας βρίσκονται ήδη σε εξέλιξη.Complex Event Recognition is the subdivision of Artificial Intelligence that aims to design and construct systems that quickly process large and often heterogeneous streams of data and timely deduce – based on definitions set by domain experts – the occurrence of non-trivial and interesting incidents. The purpose of such systems is to provide useful insights into involved and demanding situations that would otherwise be difficult to monitor, and to assist decision making. Uncertainty and noise are inherent in such data streams and therefore, Probability Theory becomes necessary in order to deal with them. The probabilistic recognition of Complex Events can be done in a timepoint-based or an interval-based manner. This thesis focuses on PIEC, a state-of-the-art probabilistic, interval-based Complex Event Recognition algorithm. We present the algorithm and examine it in detail. We study its correctness through a series of mathematical proofs of its soundness and completeness. Afterwards, we provide thorough experimental evaluation and comparison to point-based probabilistic Event Recognition methods. Our evaluation shows that PIEC consistently displays better Recall measures, often at the expense of a generally worse Precision. We then focus on cases where PIEC performs significantly better and cases where it falls short, in an effort to detect and state its main strengths and weaknesses. We also set the general directions for further research on the topic, parts of which are already in progress

    Accelerating Event Stream Processing in On- and Offline Systems

    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    Advancements and Challenges in Object-Centric Process Mining: A Systematic Literature Review

    Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process mining analyses remains limited. This paper embarks on a comprehensive literature review of object-centric process mining, providing insights into the current status of the discipline and its historical trajectory

    Towards Prioritized Event Matching in a Content-based Publish/Subscribe System

    International audienceQoS support is important for a large-scale content-based publish/subscribe (pub/sub) system to provide guaranteed service for clients with high QoS requirements. So far, great efforts have been dedicated to integrating QoS support into pub/sub systems. However, most work focus on providing QoS support on routing, without touching QoS support in event matching. In this paper, we propose the idea of prioritized event matching, aiming to integrate QoS support into event matching. We first point out the lack of time metrics that reveal performance detail of matching algorithms, leading to the definition of new time metrics. Through a series of experiments conducted in terms of new metrics, we discover the foundation for prioritized event matching. Finally, we realize prioritized event matching, called Pri-Rein, based on an existing matching algorithm and provide three design guidelines learned from the lessons in Pri-Rein. Extensive experiments are conducted to verify the effectiveness and efficiency of Pri-Rein and results show that Pri-Rein well achieves our design goal. We argue that the idea proposed in this paper can be generalized to matching algorithms that are used in cloud computing or complex event processing

    Processamento de eventos complexos como serviço em ambientes multi-nuvem

    Orientadores: Luiz Fernando Bittencourt, Miriam Akemi Manabe CapretzTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O surgimento das tecnologias de dispositivos móveis e da Internet das Coisas, combinada com avanços das tecnologias Web, criou um novo mundo de Big Data em que o volume e a velocidade da geração de dados atingiu uma escala sem precedentes. Por ser uma tecnologia criada para processar fluxos contínuos de dados, o Processamento de Eventos Complexos (CEP, do inglês Complex Event Processing) tem sido frequentemente associado a Big Data e aplicado como uma ferramenta para obter informações em tempo real. Todavia, apesar desta onda de interesse, o mercado de CEP ainda é dominado por soluções proprietárias que requerem grandes investimentos para sua aquisição e não proveem a flexibilidade que os usuários necessitam. Como alternativa, algumas empresas adotam soluções de baixo nível que demandam intenso treinamento técnico e possuem alto custo operacional. A fim de solucionar esses problemas, esta pesquisa propõe a criação de um sistema de CEP que pode ser oferecido como serviço e usado através da Internet. Um sistema de CEP como Serviço (CEPaaS, do inglês CEP as a Service) oferece aos usuários as funcionalidades de CEP aliadas às vantagens do modelo de serviços, tais como redução do investimento inicial e baixo custo de manutenção. No entanto, a criação de tal serviço envolve inúmeros desafios que não são abordados no atual estado da arte de CEP. Em especial, esta pesquisa propõe soluções para três problemas em aberto que existem neste contexto. Em primeiro lugar, para o problema de entender e reusar a enorme variedade de procedimentos para gerência de sistemas CEP, esta pesquisa propõe o formalismo Reescrita de Grafos com Atributos para Gerência de Processamento de Eventos Complexos (AGeCEP, do inglês Attributed Graph Rewriting for Complex Event Processing Management). Este formalismo inclui modelos para consultas CEP e transformações de consultas que são independentes de tecnologia e linguagem. Em segundo lugar, para o problema de avaliar estratégias de gerência e processamento de consultas CEP, esta pesquisa apresenta CEPSim, um simulador de sistemas CEP baseado em nuvem. Por fim, esta pesquisa também descreve um sistema CEPaaS fundamentado em ambientes multi-nuvem, sistemas de gerência de contêineres e um design multiusuário baseado em AGeCEP. Para demonstrar sua viabilidade, o formalismo AGeCEP foi usado para projetar um gerente autônomo e um conjunto de políticas de auto-gerenciamento para sistemas CEP. Além disso, o simulador CEPSim foi minuciosamente avaliado através de experimentos que demonstram sua capacidade de simular sistemas CEP com acurácia e baixo custo adicional de processamento. Por fim, experimentos adicionais validaram o sistema CEPaaS e demonstraram que o objetivo de oferecer funcionalidades CEP como um serviço escalável e tolerante a falhas foi atingido. Em conjunto, esses resultados confirmam que esta pesquisa avança significantemente o estado da arte e também oferece novas ferramentas e metodologias que podem ser aplicadas à pesquisa em CEPAbstract: The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be offered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management procedures, this research introduces the Attributed Graph Rewriting for Complex Event Processing Management (AGeCEP) formalism as a technology- and language-agnostic representation of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of offering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP researchDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação140920/2012-9CNP

    Complex Event Processing as a Service in Multi-Cloud Environments

    Challenges in using the actor model in software development, systematic literature review

    Toimijamalli on hajautetun ja samanaikaisen laskennan malli, jossa pienet osat ohjelmistoa viestivät keskenään asynkronisesti ja käyttäjälle näkyvä toiminnallisuus on usean osan yhteistyöstä esiin nouseva ominaisuus. Nykypäivän ohjelmistojen täytyy kestää valtavia käyttäjämääriä ja sitä varten niiden täytyy pystyä nostamaan kapasiteettiaan nopeasti skaalautuakseen. Pienempiä ohjelmiston osia on helpompi lisätä kysynnän mukaan, joten toimijamalli vaikuttaa vastaavan tähän tarpeeseen. Toimijamallin käytössä voi kuitenkin esiintyä haasteita, joita tämä tutkimus pyrkii löytämään ja esittelemään. Tutkimus toteutetaan systemaattisena kirjallisuuskatsauksena toimijamalliin liittyvistä tutkimuksista. Valituista tutkimuksista kerättiin tietoja, joiden pohjalta tutkimuskysymyksiin vastattiin. Tutkimustulokset listaavat ja kategorisoivat ohjelmistokehityksen ongelmia, joihin käytettiin toimijamallia, sekä erilaisia toimijamallin käytössä esiintyviä haasteita ja niiden ratkaisuita. Tutkimuksessa löydettiin toimijamallin käytössä esiintyviä haasteita ja näille haasteille luotiin uusi kategorisointi. Haasteiden juurisyitä analysoidessa havaittiin, että suuri osa toimijamallin haasteista johtuvat asynkronisen viestinnän käyttämisestä, ja että ohjelmoijan on oltava jatkuvasti tarkkana omista oletuksistaan viestijärjestyksestä. Haasteisiin esitetyt ratkaisut kategorisoitiin niihin liittyvän lisättävän koodin sijainnin mukaan