9 research outputs found

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    eSPICE: Probabilistic Load Shedding from Input Event Streams in Complex Event Processing

    Full text link
    Complex event processing systems process the input event streams on-the-fly. Since input event rate could overshoot the system's capabilities and results in violating a defined latency bound, load shedding is used to drop a portion of the input event streams. The crucial question here is how many and which events to drop so the defined latency bound is maintained and the degradation in the quality of results is minimized. In stream processing domain, different load shedding strategies have been proposed but they mainly depend on the importance of individual tuples (events). However, as complex event processing systems perform pattern detection, the importance of events is also influenced by other events in the same pattern. In this paper, we propose a load shedding framework called eSPICE for complex event processing systems. eSPICE depends on building a probabilistic model that learns about the importance of events in a window. The position of an event in a window and its type are used as features to build the model. Further, we provide algorithms to decide when to start dropping events and how many events to drop. Moreover, we extensively evaluate the performance of eSPICE on two real-world datasets.Comment: 13 page

    Window drop load shedding in complex event processing

    Get PDF
    In Complex Event Processing (CEP) huge input event streams are processed to interpret specific situations in real time. In the field of parallel CEP the event stream is split into different windows which can be processed in parallel by several operator instances. This paradigm reduces the load imposed on a single operator instance and allows for horizontal scalability. However, in case of high load, the operator instances may not be able to process the incoming events in time; the unprocessed events are queued up resulting in a higher processing latency. In such situations it can be desirable to impose a latency bound on the system. One way to satisfy the given latency bound, is to do load shedding by dropping incoming windows. This problem is not trivial, as it is unclear when, which and how many windows need to be dropped. To answer these questions we try to find out the most promising windows, e.g. windows which may yield a high amount of complex events, yet impose a low impact on processing time. However, this adds additional challenges, as the processing latency of a window depends on many unknown variables, such as the size of the window, the types of incoming events, the position of these events relative to each other and even on the processing latency of other overlapping windows. Furthermore, even if the quality and processing latency of a window is determined, there are several open questions regarding the timing and frequency of load shedding in order to not violate the latency bound, but still keep enough windows active. In the scope of this thesis, we introduce a latency and quality model to estimate the processing latency a window induces. Based on this model, we propose an algorithm which decides the windows to drop in case of high system load to satisfy a given latency bound while minimizing the loss of quality.Beim Complex Event Processing werden große Datenströme verarbeitet, um daraus bestimmte Situationen in Echtzeit herzuleiten - sogenannte komplexe Ereignisse. Beim Distributed Complex Event Processing wird der Datenstrom in separate Fenster unterteilt, die dann parallel von verschiedenen Operatoren verarbeitet werden. Dieses Paradigma dient dazu, die Last auf einem einzelnen Operator zu reduzieren und ermöglicht dadurch horizontale Skalierbarkeit. Im Falle von hoher Last in kurzer Zeit, kann es jedoch dazu kommen, dass die Operatoren die Ereignisströme nicht in hinnehmbarer Zeit abarbeiten können. In solchen Situationen kann es sinnvoll sein, eine Latenz-Obergrenze zu definieren, die das System erfüllen muss. Eine Möglichkeit diese Latenz-Obergrenze zu gewährleisten ist mit Lastabwurf - das abwerfen von ganzen Fenstern. Dieses Problem ist nicht trivial, da es nicht klar ist, welche und wie viele Fenster abgeworfen werden müssen. Um diese Frage zu beantworten, versuchen wir die aussichtsreichsten Fenster zu finden, also die Fenster welche am ehesten komplexe Ereignisse detektieren und welche am wenigsten Last erzeugen. Dies erzeugt jedoch weitere Herausforderungen, da die Last und Qualität eines Fensters von vielen unterschiedlichen Faktoren abhängt, die teils unbekannt sind. Beispielsweise die Größe eines Fensters, die Typen der einkommenden Ereignisse, die Positionen der Ereignisse relativ zueinander und die Last eines Fensters. Selbst wenn diese Faktoren klar wären, stünde noch die Frage, welche und wie oft Fenster abgeworfen werden müssen, um die Latenz-Obergrenze zu erfüllen. Im Umfang dieser Arbeit stellen wir eine Methode zur Bewertung von Fenster vor. Mithilfe dieser Methode zeigen wir einen Algorithmus, welcher entscheidet, welche Fenster im Falle von hoher Last abgeworfen werden sollen, um einen möglichst geringen Verlust von komplexen Ereignissen in Kauf zu nehmen

    Stackless Processing of Streamed Trees

    Get PDF
    International audienceProcessing tree-structured data in the streaming model is a challenge: capturing regular properties of streamed trees by means of a stack is costly in memory, but falling back to finite-state automata drastically limits the computational power. We propose an intermediate stackless model based on register automata equipped with a single counter, used to maintain the current depth in the tree. We explore the power of this model to validate and query streamed trees. Our main result is an effective characterization of regular path queries (RPQs) that can be evaluated stacklessly-with and without registers. In particular, we confirm the conjectured characterization of tree languages defined by DTDs that are recognizable without registers, by Segoufin and Vianu (2002), in the special case of tree languages defined by means of an RPQ

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency
    corecore