14,591 research outputs found
StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge
Today, massive amounts of streaming data from smart devices need to be
analyzed automatically to realize the Internet of Things. The Complex Event
Processing (CEP) paradigm promises low-latency pattern detection on event
streams. However, CEP systems need to be extended with Machine Learning (ML)
capabilities such as online training and inference in order to be able to
detect fuzzy patterns (e.g., outliers) and to improve pattern recognition
accuracy during runtime using incremental model training. In this paper, we
propose a distributed CEP system denoted as StreamLearner for ML-enabled
complex event detection. The proposed programming model and data-parallel
system architecture enable a wide range of real-world applications and allow
for dynamically scaling up and out system resources for low-latency,
high-throughput event processing. We show that the DEBS Grand Challenge 2017
case study (i.e., anomaly detection in smart factories) integrates seamlessly
into the StreamLearner API. Our experiments verify scalability and high event
throughput of StreamLearner.Comment: Christian Mayer, Ruben Mayer, and Majd Abdo. 2017. StreamLearner:
Distributed Incremental Machine Learning on Event Streams: Grand Challenge.
In Proceedings of the 11th ACM International Conference on Distributed and
Event-based Systems (DEBS '17), 298-30
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …