19 research outputs found

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    Poster: Infusing Trust in Indoor Tracking

    Get PDF
    An indoor tracking system is inherently an asynchronous and distributed system that contains various types (e.g., detection, selection, and fusion) of events. One of the key challenges with regards to indoor tracking is an efficient selection and arrangement of sensor devices in the environment. Selecting the "right" subset of these sensors for tracking an object as it traverses an indoor environment is the necessary precondition to achieving accurate indoor tracking. With the recent proliferation of mobile devices, specifically those with many onboard sensors, this challenge has increased in both complexity and scale. No longer can one assume that the sensor infrastructure is static, but rather indoor tracking systems must consider and properly plan for a wide variety of sensors, both static and mobile, to be present. In such a dynamic setup, sensors need to be properly selected using an opportunistic approach. This opportunistic tracking allows for a new dimension of indoor tracking that previously was often infeasible or unpractical due to logistic or financial constraints of most entities. In this paper, we are proposing a selection technique that uses trust as manifested by its a quality-of-service (QoS) feature, accuracy, in a sensor selection function. We first outline how classification of sensors is achieved in a dynamic manner and then how the accuracy can be discerned from this classification in an effort to properly identify the trust of a tracking sensor and then use this information to improve the sensor selection process. We conclude this paper with a discussion of results of this implementation on a prototype indoor tracking system in an effort to demonstrate the overall effectiveness of this selection technique

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    FireDeX: a Prioritized IoT Data Exchange Middleware for Emergency Response

    Get PDF
    International audienceReal-time event detection and targeted decision making for emerging mission-critical applications, e.g. smart fire fighting, requires systems that extract and process relevant data from connected IoT devices in the environment. In this paper, we propose FireDeX, a cross-layer middleware that facilitates timely and effective exchange of data for coordinating emergency response activities. FireDeX adopts a publish-subscribe data exchange paradigm with brokers at the network edge to manage prioritized delivery of mission-critical data from IoT sources to relevant subscribers. It incorporates parameters at the application, network, and middleware layers into a data exchange service that accurately estimates end-to-end performance metrics (e.g. delays, success rates). We design an extensible queueing theoretic model that abstracts these cross-layer interactions as a network of queues, thereby making it amenable for rapid analysis. We propose novel algorithms that utilize results of this analysis to tune data exchange configurations (event priorities and dropping policies) while meeting situational awareness requirements and resource constraints. FireDeX leverages Software-Defined Networking (SDN) methodologies to enforce these configurations in the IoT network infrastructure. We evaluate its performance through simulated experiments in a smart building fire response scenario. Our results demonstrate significant improvement to mission-critical data delivery under a variety of conditions. Our application-aware prioritization algorithm improves the value of exchanged information by 36% when compared with no prioritization; the addition of our network-aware drop rate policies improves this performance by 42% over priorities only and by 94% over no prioritization

    Mobility-awareness in complex event processing systems

    Get PDF
    The proliferation and vast deployment of mobile devices and sensors over the last couple of years enables a huge number of Mobile Situation Awareness (MSA) applications. These applications need to react in near real-time to situations in the environment of mobile objects like vehicles, pedestrians, or cargo. To this end, Complex Event Processing (CEP) is becoming increasingly important as it allows to scalably detect situations “on-the-fly” by continously processing distributed sensor data streams. Furthermore, recent trends in communication networks promise high real-time conformance to CEP systems by processing sensor data streams on distributed computing resources at the edge of the network, where low network latencies can be achieved. Yet, supporting MSA applications with a CEP middleware that utilizes distributed computing resources proves to be challenging due to the dynamics of mobile devices and sensors. In particular, situations need to be efficiently, scalably, and consistently detected with respect to ever-changing sensors in the environment of a mobile object. Moreover, the computing resources that provide low latencies change with the access points of mobile devices and sensors. The goal of this thesis is to provide concepts and algorithms to i) continuously detect situations that recently occurred close to a mobile object, ii) support bandwidth and computational efficient detections of such situations on distributed computing resources, and iii) support consistent, low latency, and high quality detections of such situations. To this end, we introduce the distributed Mobile CEP (MCEP) system which automatically adapts the processing of sensor data streams according to a mobile object’s location. MCEP provides an expressive, location-aware query model for situations that recently occurred at a location close to a mobile object. MCEP significantly reduces latency, bandwidth, and processing overhead by providing on-demand and opportunistic adaptation algorithms to dynamically assign event streams to queries of the MCEP system. Moreover, MCEP incorporates algorithms to adapt the deployment of MCEP queries in a network of computing resources. This way, MCEP supports latency-sensitive, large-scale deployments of MSA applications and ensures a low network utilization while mobile objects change their access points to the system. MCEP also provides methods to increase the scalability in terms of deployed MCEP queries by reusing event streams and computations for detecting common situations for several mobile objects

    A Survey of Challenges for Runtime Verification from Advanced Application Domains (Beyond Software)

    Get PDF
    Runtime verification is an area of formal methods that studies the dynamic analysis of execution traces against formal specifications. Typically, the two main activities in runtime verification efforts are the process of creating monitors from specifications, and the algorithms for the evaluation of traces against the generated monitors. Other activities involve the instrumentation of the system to generate the trace and the communication between the system under analysis and the monitor. Most of the applications in runtime verification have been focused on the dynamic analysis of software, even though there are many more potential applications to other computational devices and target systems. In this paper we present a collection of challenges for runtime verification extracted from concrete application domains, focusing on the difficulties that must be overcome to tackle these specific challenges. The computational models that characterize these domains require to devise new techniques beyond the current state of the art in runtime verification

    Automating Industrial Event Stream Analytics: Methods, Models, and Tools

    Get PDF
    Industrial event streams are an important cornerstone of Industrial Internet of Things (IIoT) applications. For instance, in the manufacturing domain, such streams are typically produced by distributed industrial assets at high frequency on the shop floor. To add business value and extract the full potential of the data (e.g. through predictive quality assessment or maintenance), industrial event stream analytics is an essential building block. One major challenge is the distribution of required technical and domain knowledge across several roles, which makes the realization of analytics projects time-consuming and error-prone. For instance, accessing industrial data sources requires a high level of technical skills due to a large heterogeneity of protocols and formats. To reduce the technical overhead of current approaches, several problems must be addressed. The goal is to enable so-called "citizen technologists" to evaluate event streams through a self-service approach. This requires new methods and models that cover the entire data analytics cycle. In this thesis, the research question is answered, how citizen technologists can be facilitated to independently perform industrial event stream analytics. The first step is to investigate how the technical complexity of modeling and connecting industrial data sources can be reduced. Subsequently, it is analyzed how the event streams can be automatically adapted (directly at the edge), to meet the requirements of data consumers and the infrastructure. Finally, this thesis examines how machine learning models for industrial event streams can be trained in an automated way to evaluate previously integrated data. The main research contributions of this work are: 1. A semantics-based adapter model to describe industrial data sources and to automatically generate adapter instances on edge nodes. 2. An extension for publish-subscribe systems that dynamically reduces event streams while considering requirements of downstream algorithms. 3. A novel AutoML approach to enable citizen data scientists to train and deploy supervised ML models for industrial event streams. The developed approaches are fully implemented in various high-quality software artifacts. These have been integrated into a large open-source project, which enables rapid adoption of the novel concepts into real-world environments. For the evaluation, two user studies to investigate the usability, as well as performance and accuracy tests of the individual components were performed

    Distributed graph partitioning for large-scale graph analytics

    Get PDF
    Through constant technical progress the amount of available data about almost anything is growing steadily. Since hardware has become very cheap in the last decades, it is also possible to store and process huge amounts of data. Moreover companies like Google have sprung up that generate a large part of their revenues by extracting valuable information from collected data. A common approach to compute large inputs efficiently is to use a distributed system and concurrent algorithms. Therefore it is necessary to distribute the data to be processed intelligently among the cores or machines. Thereby the distribution of the input has strong effects on the efficiency of algorithms processing it. Furthermore data which can be represented as graph is ubiquitous, Facebook and the world wide web are well known examples of it. In this case, graph partitioning algorithms are used to distribute the input data in the best possible way. Since graph partitioning is a NP-hard problem, heuristic methods have to be used to keep execution time within reasonable bounds. This thesis presents and analyses a divide-and-conquer based framework for graph partitioning which aims to approximate linear runtime with better partitioning results than linear time algorithms

    Koostööäriprotsesside läbiviimine plokiahelal: süsteem

    Get PDF
    Tänapäeval peavad organisatsioonid tegema omavahel koostööd, et kasutada ära üksteise täiendavaid võimekusi ning seeläbi pakkuda oma klientidele parimaid tooteid ja teenuseid. Selleks peavad organisatsioonid juhtima äriprotsesse, mis ületavad nende organisatsioonilisi piire. Selliseid protsesse nimetatakse koostööäriprotsessideks. Üks peamisi takistusi koostööäriprotsesside elluviimisel on osapooltevahelise usalduse puudumine. Plokiahel loob detsentraliseeritud pearaamatu, mida ei saa võltsida ning mis toetab nutikate lepingute täitmist. Nii on võimalik teha koostööd ebausaldusväärsete osapoolte vahel ilma kesksele asutusele tuginemata. Paraku on aga äriprotsesside läbiviimine selliseid madala taseme plokiahela elemente kasutades tülikas, veaohtlik ja erioskusi nõudev. Seevastu juba väljakujunenud äriprotsesside juhtimissüsteemid (Business Process Management System – BPMS) pakuvad käepäraseid abstraheeringuid protsessidele orienteeritud rakenduste kiireks arendamiseks. Käesolev doktoritöö käsitleb koostööäriprotsesside automatiseeritud läbiviimist plokiahela tehnoloogiat kasutades, kombineerides traditsioonliste BPMS- ide arendusvõimalused plokiahelast tuleneva suurendatud usaldusega. Samuti käsitleb antud doktoritöö küsimust, kuidas pakkuda tuge olukordades, milles uued osapooled võivad jooksvalt protsessiga liituda, mistõttu on vajalik tagada paindlikkus äriprotsessi marsruutimisloogika muutmise osas. Doktoritöö uurib tarkvaraarhitektuurilisi lähenemisviise ja modelleerimise kontseptsioone, pakkudes välja disainipõhimõtteid ja nõudeid, mida rakendatakse uudsel plokiahela baasil loodud äriprotsessi juhtimissüsteemil CATERPILLAR. CATERPILLAR-i süsteem toetab kahte lähenemist plokiahelal põhinevate protsesside rakendamiseks, läbiviimiseks ja seireks: kompileeritud ja tõlgendatatud. Samuti toetab see kahte kontrollitud paindlikkuse mehhanismi, mille abil saavad protsessis osalejad ühiselt otsustada, kuidas protsessi selle täitmise ajal uuendada ning anda ja eemaldada osaliste juurdepääsuõigusi.Nowadays, organizations are pressed to collaborate in order to take advantage of their complementary capabilities and to provide best-of-breed products and services to their customers. To do so, organizations need to manage business processes that span beyond their organizational boundaries. Such processes are called collaborative business processes. One of the main roadblocks to implementing collaborative business processes is the lack of trust between the participants. Blockchain provides a decentralized ledger that cannot be tamper with, that supports the execution of programs called smart contracts. These features allow executing collaborative processes between untrusted parties and without relying on a central authority. However, implementing collaborative business processes in blockchain can be cumbersome, error-prone and requires specialized skills. In contrast, established Business Process Management Systems (BPMSs) provide convenient abstractions for rapid development of process-oriented applications. This thesis addresses the problem of automating the execution of collaborative business processes on top of blockchain technology in a way that takes advantage of the trust-enhancing capabilities of this technology while offering the development convenience of traditional BPMSs. The thesis also addresses the question of how to support scenarios in which new parties may be onboarded at runtime, and in which parties need to have the flexibility to change the default routing logic of the business process. We explore architectural approaches and modelling concepts, formulating design principles and requirements that are implemented in a novel blockchain-based BPMS named CATERPILLAR. The CATERPILLAR system supports two methods to implement, execute and monitor blockchain-based processes: compiled and interpreted. It also supports two mechanisms for controlled flexibility; i.e., participants can collectively decide on updating the process during its execution as well as granting and revoking access to parties.https://www.ester.ee/record=b536494
    corecore