12 research outputs found

    A systematic literature review of skyline query processing over data stream

    Get PDF
    Recently, skyline query processing over data stream has gained a lot of attention especially from the database community owing to its own unique challenges. Skyline queries aims at pruning a search space of a potential large multi-dimensional set of objects by keeping only those objects that are not worse than any other. Although an abundance of skyline query processing techniques have been proposed, there is a lack of a Systematic Literature Review (SLR) on current research works pertinent to skyline query processing over data stream. In regard to this, this paper provides a comparative study on the state-of-the-art approaches over the period between 2000 and 2022 with the main aim to help readers understand the key issues which are essential to consider in relation to processing skyline queries over streaming data. Seven digital databases were reviewed in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) procedures. After applying both the inclusion and exclusion criteria, 23 primary papers were further examined. The results show that the identified skyline approaches are driven by the need to expedite the skyline query processing mainly due to the fact that data streams are time varying (time sensitive), continuous, real time, volatile, and unrepeatable. Although, these skyline approaches are tailored made for data stream with a common aim, their solutions vary to suit with the various aspects being considered, which include the type of skyline query, type of streaming data, type of sliding window, query processing technique, indexing technique as well as the data stream environment employed. In this paper, a comprehensive taxonomy is developed along with the key aspects of each reported approach, while several open issues and challenges related to the topic being reviewed are highlighted as recommendation for future research direction

    Parallel Patterns for Adaptive Data Stream Processing

    Get PDF
    In recent years our ability to produce information has been growing steadily, driven by an ever increasing computing power, communication rates, hardware and software sensors diffusion. This data is often available in the form of continuous streams and the ability to gather and analyze it to extract insights and detect patterns is a valuable opportunity for many businesses and scientific applications. The topic of Data Stream Processing (DaSP) is a recent and highly active research area dealing with the processing of this streaming data. The development of DaSP applications poses several challenges, from efficient algorithms for the computation to programming and runtime systems to support their execution. In this thesis two main problems will be tackled: * need for high performance: high throughput and low latency are critical requirements for DaSP problems. Applications necessitate taking advantage of parallel hardware and distributed systems, such as multi/manycores or cluster of multicores, in an effective way; * dynamicity: due to their long running nature (24hr/7d), DaSP applications are affected by highly variable arrival rates and changes in their workload characteristics. Adaptivity is a fundamental feature in this context: applications must be able to autonomously scale the used resources to accommodate dynamic requirements and workload while maintaining the desired Quality of Service (QoS) in a cost-effective manner. In the current approaches to the development of DaSP applications are still missing efficient exploitation of intra-operator parallelism as well as adaptations strategies with well known properties of stability, QoS assurance and cost awareness. These are the gaps that this research work tries to fill, resorting to well know approaches such as Structured Parallel Programming and Control Theoretic models. The dissertation runs along these two directions. The first part deals with intra-operator parallelism. A DaSP application can be naturally expressed as a set of operators (i.e. intermediate computations) that cooperate to reach a common goal. If QoS requirements are not met by the current implementation, bottleneck operators must be internally parallelized. We will study recurrent computations in window based stateful operators and propose patterns for their parallel implementation. Windowed operators are the most representative class of stateful data stream operators. Here computations are applied on the most recent received data. Windows are dynamic data structures: they evolve over time in terms of content and, possibly, size. Therefore, with respect to traditional patterns, the DaSP domain requires proper specializations and enhanced features concerning data distribution and management policies for different windowing methods. A structured approach to the problem will reduce the effort and complexity of parallel programming. In addition, it simplifies the reasoning about the performance properties of a parallel solution (e.g. throughput and latency). The proposed patterns exhibit different properties in terms of applicability and profitability that will be discussed and experimentally evaluated. The second part of the thesis is devoted to the proposal and study of predictive strategies and reconfiguration mechanisms for autonomic DaSP operators. Reconfiguration activities can be implemented in a transparent way to the application programmer thanks to the exploitation of parallel paradigms with well known structures. Furthermore, adaptation strategies may take advantage of the QoS predictability of the used parallel solution. Autonomous operators will be driven by means of a Model Predictive Control approach, with the intent of giving QoS assurances in terms of throughput or latency in a resource-aware manner. An experimental section will show the effectiveness of the proposed approach in terms of execution costs reduction as well as the stability degree of a system reconfiguration. The experiments will target shared and distributed memory architectures

    Towards an Efficient, Scalable Stream Query Operator Framework for Representing and Analyzing Continuous Fields

    Get PDF
    Advancements in sensor technology have made it less expensive to deploy massive numbers of sensors to observe continuous geographic phenomena at high sample rates and stream live sensor observations. This fact has raised new challenges since sensor streams have pushed the limits of traditional geo-sensor data management technology. Data Stream Engines (DSEs) provide facilities for near real-time processing of streams, however, algorithms supporting representing and analyzing Spatio-Temporal (ST) phenomena are limited. This dissertation investigates near real-time representation and analysis of continuous ST phenomena, observed by large numbers of mobile, asynchronously sampling sensors, using a DSE and proposes two novel stream query operator frameworks. First, the ST Interpolation Stream Query Operator Framework (STI-SQO framework) continuously transforms sensor streams into rasters using a novel set of stream query operators that perform ST-IDW interpolation. A key component of the STI-SQO framework is the 3D, main memory-based, ST Grid Index that enables high performance ST insertion and deletion of massive numbers of sensor observations through Isotropic Time Cell and Time Block-based partitioning. The ST Grid Index facilitates fast ST search for samples using ST shell-based neighborhood search templates, namely the Cylindrical Shell Template and Nested Shell Template. Furthermore, the framework contains the stream-based ST-IDW algorithms ST Shell and ST ak-Shell for high performance, parallel grid cell interpolation. Secondly, the proposed ST Predicate Stream Query Operator Framework (STP-SQO framework) efficiently evaluates value predicates over ST streams of ST continuous phenomena. The framework contains several stream-based predicate evaluation algorithms, including Region-Growing, Tile-based, and Phenomenon-Aware algorithms, that target predicate evaluation to regions with seed points and minimize the number of raster cells that are interpolated when evaluating value predicates. The performance of the proposed frameworks was assessed with regard to prediction accuracy of output results and runtime. The STI-SQO framework achieved a processing throughput of 250,000 observations in 2.5 s with a Normalized Root Mean Square Error under 0.19 using a 500×500 grid. The STP-SQO framework processed over 250,000 observations in under 0.25 s for predicate results covering less than 40% of the observation area, and the Scan Line Region Growing algorithm was consistently the fastest algorithm tested

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore