44,228 research outputs found
On Graph Stream Clustering with Side Information
Graph clustering becomes an important problem due to emerging applications
involving the web, social networks and bio-informatics. Recently, many such
applications generate data in the form of streams. Clustering massive, dynamic
graph streams is significantly challenging because of the complex structures of
graphs and computational difficulties of continuous data. Meanwhile, a large
volume of side information is associated with graphs, which can be of various
types. The examples include the properties of users in social network
activities, the meta attributes associated with web click graph streams and the
location information in mobile communication networks. Such attributes contain
extremely useful information and has the potential to improve the clustering
process, but are neglected by most recent graph stream mining techniques. In
this paper, we define a unified distance measure on both link structures and
side attributes for clustering. In addition, we propose a novel optimization
framework DMO, which can dynamically optimize the distance metric and make it
adapt to the newly received stream data. We further introduce a carefully
designed statistics SGS(C) which consume constant storage spaces with the
progression of streams. We demonstrate that the statistics maintained are
sufficient for the clustering process as well as the distance optimization and
can be scalable to massive graphs with side attributes. We will present
experiment results to show the advantages of the approach in graph stream
clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape
On Counting Triangles through Edge Sampling in Large Dynamic Graphs
Traditional frameworks for dynamic graphs have relied on processing only the
stream of edges added into or deleted from an evolving graph, but not any
additional related information such as the degrees or neighbor lists of nodes
incident to the edges. In this paper, we propose a new edge sampling framework
for big-graph analytics in dynamic graphs which enhances the traditional model
by enabling the use of additional related information. To demonstrate the
advantages of this framework, we present a new sampling algorithm, called Edge
Sample and Discard (ESD). It generates an unbiased estimate of the total number
of triangles, which can be continuously updated in response to both edge
additions and deletions. We provide a comparative analysis of the performance
of ESD against two current state-of-the-art algorithms in terms of accuracy and
complexity. The results of the experiments performed on real graphs show that,
with the help of the neighborhood information of the sampled edges, the
accuracy achieved by our algorithm is substantially better. We also
characterize the impact of properties of the graph on the performance of our
algorithm by testing on several Barabasi-Albert graphs.Comment: A short version of this article appeared in Proceedings of the 2017
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2017
StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge
Today, massive amounts of streaming data from smart devices need to be
analyzed automatically to realize the Internet of Things. The Complex Event
Processing (CEP) paradigm promises low-latency pattern detection on event
streams. However, CEP systems need to be extended with Machine Learning (ML)
capabilities such as online training and inference in order to be able to
detect fuzzy patterns (e.g., outliers) and to improve pattern recognition
accuracy during runtime using incremental model training. In this paper, we
propose a distributed CEP system denoted as StreamLearner for ML-enabled
complex event detection. The proposed programming model and data-parallel
system architecture enable a wide range of real-world applications and allow
for dynamically scaling up and out system resources for low-latency,
high-throughput event processing. We show that the DEBS Grand Challenge 2017
case study (i.e., anomaly detection in smart factories) integrates seamlessly
into the StreamLearner API. Our experiments verify scalability and high event
throughput of StreamLearner.Comment: Christian Mayer, Ruben Mayer, and Majd Abdo. 2017. StreamLearner:
Distributed Incremental Machine Learning on Event Streams: Grand Challenge.
In Proceedings of the 11th ACM International Conference on Distributed and
Event-based Systems (DEBS '17), 298-30
- …