7,731 research outputs found
Sliding windows over uncertain data streams
Uncertain data streams can have tuples with both value and existential uncertainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is <1. A situation where existential uncertainty can arise is when applying relational operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. However, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high-quality approximation in query answering. In addition, we show that sort-based similarity join algorithms can perform better than index-based techniques (on 17 real datasets) when the number of possible values per tuple is low, as in many real-world applications. © 2014, Springer-Verlag London
PHUIMUS: A Potential High Utility Itemsets Mining Algorithm Based on Stream Data with Uncertainty
High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUS-list) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
An efficient closed frequent itemset miner for the MOA stream mining system
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version
S-Store: Streaming Meets Transaction Processing
Stream processing addresses the needs of real-time applications. Transaction
processing addresses the coordination and safety of short atomic computations.
Heretofore, these two modes of operation existed in separate, stove-piped
systems. In this work, we attempt to fuse the two computational paradigms in a
single system called S-Store. In this way, S-Store can simultaneously
accommodate OLTP and streaming applications. We present a simple transaction
model for streams that integrates seamlessly with a traditional OLTP system. We
chose to build S-Store as an extension of H-Store, an open-source, in-memory,
distributed OLTP database system. By implementing S-Store in this way, we can
make use of the transaction processing facilities that H-Store already
supports, and we can concentrate on the additional implementation features that
are needed to support streaming. Similar implementations could be done using
other main-memory OLTP platforms. We show that we can actually achieve higher
throughput for streaming workloads in S-Store than an equivalent deployment in
H-Store alone. We also show how this can be achieved within H-Store with the
addition of a modest amount of new functionality. Furthermore, we compare
S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm,
and show how S-Store matches and sometimes exceeds their performance while
providing stronger transactional guarantees
Analysing Temporal Relations – Beyond Windows, Frames and Predicates
This article proposes an approach to rely on the standard
operators of relational algebra (including grouping and ag-
gregation) for processing complex event without requiring
window specifications. In this way the approach can pro-
cess complex event queries of the kind encountered in appli-
cations such as emergency management in metro networks.
This article presents Temporal Stream Algebra (TSA) which
combines the operators of relational algebra with an analy-
sis of temporal relations at compile time. This analysis de-
termines which relational algebra queries can be evaluated
against data streams, i. e. the analysis is able to distinguish
valid from invalid stream queries. Furthermore the analysis
derives functions similar to the pass, propagation and keep
invariants in Tucker's et al. \Exploiting Punctuation Seman-
tics in Continuous Data Streams". These functions enable
the incremental evaluation of TSA queries, the propagation
of punctuations, and garbage collection. The evaluation of
TSA queries combines bulk-wise and out-of-order processing
which makes it tolerant to workload bursts as they typically
occur in emergency management. The approach has been
conceived for efficiently processing complex event queries on
top of a relational database system. It has been deployed
and tested on MonetDB
Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data
Skyline queries typically search a Pareto-optimal set from a given data set
to solve the corresponding multiobjective optimization problem. As the number
of criteria increases, the skyline presumes excessive data items, which yield a
meaningless result. To address this curse of dimensionality, we proposed a
k-dominant skyline in which the number of skyline members was reduced by
relaxing the restriction on the number of dimensions, considering the
uncertainty of data. Specifically, each data item was associated with a
probability of appearance, which represented the probability of becoming a
member of the k-dominant skyline. As data items appear continuously in data
streams, the corresponding k-dominant skyline may vary with time. Therefore, an
effective and rapid mechanism of updating the k-dominant skyline becomes
crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI)
and All Indexing (AI), for k-dominant skyline in distributed edge-computing
environments, where irrelevant data items can be effectively excluded from the
compute to reduce the processing duration. Furthermore, the proposed schemes
were validated with extensive experimental simulations. The experimental
results demonstrated that the proposed MI and AI schemes reduced the
computation time by approximately 13% and 56%, respectively, compared with the
existing method.Comment: 13 pages, 8 figures, 12 tables, to appear in IEEE Transactions on
Emerging Topics in Computin
Finding event correlations in federated wireless sensor networks
Due to copyright restrictions, the access to the full text of this article is only available via subscription.Event correlation engines help us find events of interest inside raw sensor data streams and help reduce the data volume, simultaneously. This paper discusses some of the challenges faced in finding event correlations over federated wireless sensor networks (WSNs) including high data volumes, uncertain or missing data, application-specific dependencies and widely varying data ranges and sampling frequencies. Analysisover real geo-tracking data of moving objects confirms some of these challenges. Federation at the data layer above the WSNs is presented as a feasible alternative.TÜBİTAK ; IBM Shared University Research program ; European Commissio
- …