66,361 research outputs found
Parallel Hierarchical Affinity Propagation with MapReduce
The accelerated evolution and explosion of the Internet and social media is
generating voluminous quantities of data (on zettabyte scales). Paramount
amongst the desires to manipulate and extract actionable intelligence from vast
big data volumes is the need for scalable, performance-conscious analytics
algorithms. To directly address this need, we propose a novel MapReduce
implementation of the exemplar-based clustering algorithm known as Affinity
Propagation. Our parallelization strategy extends to the multilevel
Hierarchical Affinity Propagation algorithm and enables tiered aggregation of
unstructured data with minimal free parameters, in principle requiring only a
similarity measure between data points. We detail the linear run-time
complexity of our approach, overcoming the limiting quadratic complexity of the
original algorithm. Experimental validation of our clustering methodology on a
variety of synthetic and real data sets (e.g. images and point data)
demonstrates our competitiveness against other state-of-the-art MapReduce
clustering techniques
A Probabilistic Embedding Clustering Method for Urban Structure Detection
Urban structure detection is a basic task in urban geography. Clustering is a
core technology to detect the patterns of urban spatial structure, urban
functional region, and so on. In big data era, diverse urban sensing datasets
recording information like human behaviour and human social activity, suffer
from complexity in high dimension and high noise. And unfortunately, the
state-of-the-art clustering methods does not handle the problem with high
dimension and high noise issues concurrently. In this paper, a probabilistic
embedding clustering method is proposed. Firstly, we come up with a
Probabilistic Embedding Model (PEM) to find latent features from high
dimensional urban sensing data by learning via probabilistic model. By latent
features, we could catch essential features hidden in high dimensional data
known as patterns; with the probabilistic model, we can also reduce uncertainty
caused by high noise. Secondly, through tuning the parameters, our model could
discover two kinds of urban structure, the homophily and structural
equivalence, which means communities with intensive interaction or in the same
roles in urban structure. We evaluated the performance of our model by
conducting experiments on real-world data and experiments with real data in
Shanghai (China) proved that our method could discover two kinds of urban
structure, the homophily and structural equivalence, which means clustering
community with intensive interaction or under the same roles in urban space.Comment: 6 pages, 7 figures, ICSDM201
- …