5 research outputs found

    Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes

    Full text link
    The DEBS Grand Challenge 2018 is set in the context of maritime route prediction. Vessel routes are modeled as streams of Automatic Identification System (AIS) data points selected from real-world tracking data. The challenge requires to correctly estimate the destination ports and arrival times of vessel trips, as early as possible. Our proposed solution partitions the training vessel routes by reported destination port and uses a nearest neighbor search to find the training routes that are closer to the query AIS point. Particular improvements have been included as well, such as a way to avoid changing the predicted ports frequently within one query route and automating the parameters tuning by the use of a genetic algorithm. This leads to significant improvements on the final score

    Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic

    Full text link
    In this paper, we present our approach for solving the DEBS Grand Challenge 2018. The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a prediction for a destination while for the arrival time, we propose the use of Feed-forward Neural Networks. In our evaluation, we were able to achieve an accuracy of 97% for the port destination classification problem and 90% (in mins) for the ETA prediction

    The DEBS 2020 grand challenge

    Get PDF
    The ACM DEBS 2020 Grand Challenge is the tenth in a series of challenges which seek to provide a common ground and evaluation criteria for a competition aimed at both research and industrial event-based systems. The focus of the ACM DEBS 2020 Grand Challenge is on Non-Intrusive Load Monitoring (NILM). The goal of the challenge is to detect when appliances contributing to an aggregated stream of voltage and current readings from a smart meter are switched on or off. NILM is leveraged in many contexts, ranging from monitoring of energy consumption to home automation. This paper describes the specifics of the data streams provided in the challenge, as well as the benchmarking platform that supports the testing of the solutions submitted by the participants

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores
    corecore