4,384 research outputs found
Expressive Stream Reasoning with Laser
An increasing number of use cases require a timely extraction of non-trivial
knowledge from semantically annotated data streams, especially on the Web and
for the Internet of Things (IoT). Often, this extraction requires expressive
reasoning, which is challenging to compute on large streams. We propose Laser,
a new reasoner that supports a pragmatic, non-trivial fragment of the logic
LARS which extends Answer Set Programming (ASP) for streams. At its core, Laser
implements a novel evaluation procedure which annotates formulae to avoid the
re-computation of duplicates at multiple time points. This procedure, combined
with a judicious implementation of the LARS operators, is responsible for
significantly better runtimes than the ones of other state-of-the-art systems
like C-SPARQL and CQELS, or an implementation of LARS which runs on the ASP
solver Clingo. This enables the application of expressive logic-based reasoning
to large streams and opens the door to a wider range of stream reasoning use
cases.Comment: 19 pages, 5 figures. Extended version of accepted paper at ISWC 201
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
Processing count queries over event streams at multiple time granularities
Management and analysis of streaming data has become crucial with its applications in web, sensor data, network tra c data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over di erent time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method e ciently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on di erent real data sets and the results obtained are presented to show its e ectiveness
Continuous Nearest Neighbor Queries over Sliding Windows
Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.
Analysing Temporal Relations – Beyond Windows, Frames and Predicates
This article proposes an approach to rely on the standard
operators of relational algebra (including grouping and ag-
gregation) for processing complex event without requiring
window specifications. In this way the approach can pro-
cess complex event queries of the kind encountered in appli-
cations such as emergency management in metro networks.
This article presents Temporal Stream Algebra (TSA) which
combines the operators of relational algebra with an analy-
sis of temporal relations at compile time. This analysis de-
termines which relational algebra queries can be evaluated
against data streams, i. e. the analysis is able to distinguish
valid from invalid stream queries. Furthermore the analysis
derives functions similar to the pass, propagation and keep
invariants in Tucker's et al. \Exploiting Punctuation Seman-
tics in Continuous Data Streams". These functions enable
the incremental evaluation of TSA queries, the propagation
of punctuations, and garbage collection. The evaluation of
TSA queries combines bulk-wise and out-of-order processing
which makes it tolerant to workload bursts as they typically
occur in emergency management. The approach has been
conceived for efficiently processing complex event queries on
top of a relational database system. It has been deployed
and tested on MonetDB
Ranking Large Temporal Data
Ranking temporal data has not been studied until recently, even though
ranking is an important operator (being promoted as a firstclass citizen) in
database systems. However, only the instant top-k queries on temporal data were
studied in, where objects with the k highest scores at a query time instance t
are to be retrieved. The instant top-k definition clearly comes with
limitations (sensitive to outliers, difficult to choose a meaningful query time
t). A more flexible and general ranking operation is to rank objects based on
the aggregation of their scores in a query interval, which we dub the aggregate
top-k query on temporal data. For example, return the top-10 weather stations
having the highest average temperature from 10/01/2010 to 10/07/2010; find the
top-20 stocks having the largest total transaction volumes from 02/05/2011 to
02/07/2011. This work presents a comprehensive study to this problem by
designing both exact and approximate methods (with approximation quality
guarantees). We also provide theoretical analysis on the construction cost, the
index size, the update and the query costs of each approach. Extensive
experiments on large real datasets clearly demonstrate the efficiency, the
effectiveness, and the scalability of our methods compared to the baseline
methods.Comment: VLDB201
- …