8,389 research outputs found
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
Evolution of Ego-networks in Social Media with Link Recommendations
Ego-networks are fundamental structures in social graphs, yet the process of
their evolution is still widely unexplored. In an online context, a key
question is how link recommender systems may skew the growth of these networks,
possibly restraining diversity. To shed light on this matter, we analyze the
complete temporal evolution of 170M ego-networks extracted from Flickr and
Tumblr, comparing links that are created spontaneously with those that have
been algorithmically recommended. We find that the evolution of ego-networks is
bursty, community-driven, and characterized by subsequent phases of explosive
diameter increase, slight shrinking, and stabilization. Recommendations favor
popular and well-connected nodes, limiting the diameter expansion. With a
matching experiment aimed at detecting causal relationships from observational
data, we find that the bias introduced by the recommendations fosters global
diversity in the process of neighbor selection. Last, with two link prediction
experiments, we show how insights from our analysis can be used to improve the
effectiveness of social recommender systems.Comment: Proceedings of the 10th ACM International Conference on Web Search
and Data Mining (WSDM 2017), Cambridge, UK. 10 pages, 16 figures, 1 tabl
Scalable Recommendation with Poisson Factorization
We develop a Bayesian Poisson matrix factorization model for forming
recommendations from sparse user behavior data. These data are large user/item
matrices where each user has provided feedback on only a small subset of items,
either explicitly (e.g., through star ratings) or implicitly (e.g., through
views or purchases). In contrast to traditional matrix factorization
approaches, Poisson factorization implicitly models each user's limited
attention to consume items. Moreover, because of the mathematical form of the
Poisson likelihood, the model needs only to explicitly consider the observed
entries in the matrix, leading to both scalable computation and good predictive
performance. We develop a variational inference algorithm for approximate
posterior inference that scales up to massive data sets. This is an efficient
algorithm that iterates over the observed entries and adjusts an approximate
posterior over the user/item representations. We apply our method to large
real-world user data containing users rating movies, users listening to songs,
and users reading scientific papers. In all these settings, Bayesian Poisson
factorization outperforms state-of-the-art matrix factorization methods
- …