18,949 research outputs found
Parallel Hierarchical Affinity Propagation with MapReduce
The accelerated evolution and explosion of the Internet and social media is
generating voluminous quantities of data (on zettabyte scales). Paramount
amongst the desires to manipulate and extract actionable intelligence from vast
big data volumes is the need for scalable, performance-conscious analytics
algorithms. To directly address this need, we propose a novel MapReduce
implementation of the exemplar-based clustering algorithm known as Affinity
Propagation. Our parallelization strategy extends to the multilevel
Hierarchical Affinity Propagation algorithm and enables tiered aggregation of
unstructured data with minimal free parameters, in principle requiring only a
similarity measure between data points. We detail the linear run-time
complexity of our approach, overcoming the limiting quadratic complexity of the
original algorithm. Experimental validation of our clustering methodology on a
variety of synthetic and real data sets (e.g. images and point data)
demonstrates our competitiveness against other state-of-the-art MapReduce
clustering techniques
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
- …