61 research outputs found

    StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge

    Full text link
    Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g., outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.Comment: Christian Mayer, Ruben Mayer, and Majd Abdo. 2017. StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS '17), 298-30

    Socially aware microcloud service overlay optimization in community networks

    Get PDF
    Community networks are a growing network cooperation effort by citizens to build and maintain Internet infrastructure in regions that are not available. Adding that, to bring cloud services to community networks (CNs), microclouds were started as an edge cloud computing model where members cooperate using resources. Therefore, enhancing routing for services in CNs is an attractive paradigm that benefits the infrastructure. The problem is the growing consumption of resources for disseminating messages in the CN environment. This is because the services that build their overlay networks are oblivious to the underlying workload patterns that arise from social cooperation in CNs. In this paper, we propose Select in Community Networks (SELECTinCN), which enhances the overlay creation for pub/sub systems over peer-to-peer (P2P) networks. Moreover, SELECTinCN includes social information based on cooperation within CNs by exploiting the social aspects of the community of practice. Our work organizes the peers in a ring topology and provides an adaptive P2P connection establishment algorithm, where each peer identifies the number of connections needed based on the social structure and user availability. This allows us to propagate messages using a reduced number of hops, thus providing an efficient heuristic to an NP-hard problem that maps the workload graph to the structured P2P overlays resulting in a number of messages close to the theoretical minimum. Experiments show that, by using social network information, SELECTinCN reduces the number of relay nodes by up to 89% using the community of practice information versus the state-of-the-art pub/sub notification systems given as baseline.Peer ReviewedPostprint (author's final draft

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    An Optimal Algorithm for Sliding Window Order Statistics

    Get PDF
    Assume there is a data stream of elements and a window of size m. Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O(1) while computing the sliding window median has time complexity O(log m). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k-th smallest element in O(log k) time and (2) prove that this time complexity is optimal

    Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

    Get PDF

    Trustworthiness in Mobile Cyber Physical Systems

    Get PDF
    Computing and communication capabilities are increasingly embedded in diverse objects and structures in the physical environment. They will link the ‘cyberworld’ of computing and communications with the physical world. These applications are called cyber physical systems (CPS). Obviously, the increased involvement of real-world entities leads to a greater demand for trustworthy systems. Hence, we use "system trustworthiness" here, which can guarantee continuous service in the presence of internal errors or external attacks. Mobile CPS (MCPS) is a prominent subcategory of CPS in which the physical component has no permanent location. Mobile Internet devices already provide ubiquitous platforms for building novel MCPS applications. The objective of this Special Issue is to contribute to research in modern/future trustworthy MCPS, including design, modeling, simulation, dependability, and so on. It is imperative to address the issues which are critical to their mobility, report significant advances in the underlying science, and discuss the challenges of development and implementation in various applications of MCPS

    Advancements and Challenges in Object-Centric Process Mining: A Systematic Literature Review

    Full text link
    Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process mining analyses remains limited. This paper embarks on a comprehensive literature review of object-centric process mining, providing insights into the current status of the discipline and its historical trajectory

    Complex Event Recognition: a comparison between FlinkCEP and the Run-Time Event Calculus

    Get PDF
    Ο κλάδος της Αναγνώρισης Σύνθετων Γεγονότων πάνω σε ροές από δεδομένα έχει επιδείξει αξιοσημείωτη ανάπτυξη τα τελευταία χρόνια. Τα συστήματα αναγνώρισης σύνθετων γεγονότων περιεργάζονται ροές από δεδομένα με σκοπό τον εντοπισμό σύνθετων φαινομένων, που εκφράζουν σχέσεις ανάμεσα στα δεδομένα εισόδου. Ο αριθμός των συστημάτων που έχουν αναπτυχθεί τα τελευταία χρόνια έχει δημιουργήσει την ανάγκη για μελέτη και σύγκριση των δυνατοτήτων τους. Σε αυτήν την μελέτη επιλέγουμε δύο συστήματα από τις πιο επικρατούσες κατηγορίες. Διαλέγουμε το FlinkCEP από τα συστήματα βασισμένα σε αυτόματα και το RTEC από τα συστήματα που χρησιμοποιούν λογική. Παρουσιάζουμε μια θεωρητική σύγκριση της εκφραστικότητας των δύο συστημάτων, μαζί με μια πειραματική αξιολόγηση της αποδοτικότητας τους, χρησιμοποιώντας πραγματικά δεδομένα.The field of Complex Event Recognition (CER) on streams of data has shown remarkable growth the last few years. CER systems use streaming data in order to detect composite phenomena expressing relations between the input data. The amount of developed CER systems has created the need to examine and compare their capabilities. In this study we have chosen two systems, originating form the most dominant categories. From automata-based approaches we have selected FlinkCEP and from Logic-based systems we have selected RTEC. We present a theoretical comparison of the two systems’ expressiveness, along with an empirical evaluation of the efficiency, using real data
    corecore