5 research outputs found

    Cache-efficient sweeping-based interval joins for extended Allen relation predicates

    Full text link
    We develop a family of efficient plane-sweeping interval join algorithms for evaluating a wide range of interval predicates such as Allen’s relationships and parameterized relationships. Our technique is based on a framework, components of which can be flexibly combined in different manners to support the required interval relation. In temporal databases, our algorithms can exploit a well-known and flexible access method, the Timeline Index, thus expanding the set of operations it supports even further. Additionally, employing a compact data structure, the gapless hash map, we utilize the CPU cache efficiently. In an experimental evaluation, we show that our approach is several times faster and scales better than state-of-the-art techniques, while being much better suited for real-time event processing

    On efficient temporal subgraph query processing

    Get PDF

    Graph based management of temporal data

    Get PDF
    In recent decades, there has been a significant increase in the use of smart devices and sensors that led to high-volume temporal data generation. Temporal modeling and querying of this huge data have been essential for effective querying and retrieval. However, custom temporal models have the problem of generalizability, whereas the extended temporal models require users to adapt to new querying languages. In this thesis, we propose a method to improve the modeling and retrieval of temporal data using an existing graph database system (i.e., Neo4j) without extending with additional operators. Our work focuses on temporal data represented as intervals (event with a start and end time). We propose a novel way of storing temporal interval as cartesian points where the start time and the end time are stored as the x and y axis of the cartesian coordinate. We present how queries based on Allen’s interval relationships can be represented using our model on a cartesian coordinate system by visualizing these queries. Temporal queries based on Allen’s temporal intervals are then used to validate our model and compare with the traditional way of storing temporal intervals (i.e., as attributes of nodes). Our experimental results on a soccer graph database with around 4000 games show that the spatial representation of temporal interval can provide significant performance (up to 3.5 times speedup) gains compared to a traditional model

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    ISEQL, an Interval-based Surveillance Event Query Language

    No full text
    corecore