76 research outputs found
Consistent Estimation of Mixed Memberships with Successive Projections
This paper considers the parameter estimation problem in Mixed Membership
Stochastic Block Model (MMSB), which is a quite general instance of random
graph model allowing for overlapping community structure. We present the new
algorithm successive projection overlapping clustering (SPOC) which combines
the ideas of spectral clustering and geometric approach for separable
non-negative matrix factorization. The proposed algorithm is provably
consistent under MMSB with general conditions on the parameters of the model.
SPOC is also shown to perform well experimentally in comparison to other
algorithms
A deep matrix factorization method for learning attribute representations
Semi-Non-negative Matrix Factorization is a technique that learns a
low-dimensional representation of a dataset that lends itself to a clustering
interpretation. It is possible that the mapping between this new representation
and our original data matrix contains rather complex hierarchical information
with implicit lower-level hidden attributes, that classical one level
clustering methodologies can not interpret. In this work we propose a novel
model, Deep Semi-NMF, that is able to learn such hidden representations that
allow themselves to an interpretation of clustering according to different,
unknown attributes of a given dataset. We also present a semi-supervised
version of the algorithm, named Deep WSF, that allows the use of (partial)
prior information for each of the known attributes of a dataset, that allows
the model to be used on datasets with mixed attribute knowledge. Finally, we
show that our models are able to learn low-dimensional representations that are
better suited for clustering, but also classification, outperforming
Semi-Non-negative Matrix Factorization, but also other state-of-the-art
methodologies variants.Comment: Submitted to TPAMI (16-Mar-2015
Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach
The increasing availability of temporal network data is calling for more
research on extracting and characterizing mesoscopic structures in temporal
networks and on relating such structure to specific functions or properties of
the system. An outstanding challenge is the extension of the results achieved
for static networks to time-varying networks, where the topological structure
of the system and the temporal activity patterns of its components are
intertwined. Here we investigate the use of a latent factor decomposition
technique, non-negative tensor factorization, to extract the community-activity
structure of temporal networks. The method is intrinsically temporal and allows
to simultaneously identify communities and to track their activity over time.
We represent the time-varying adjacency matrix of a temporal network as a
three-way tensor and approximate this tensor as a sum of terms that can be
interpreted as communities of nodes with an associated activity time series. We
summarize known computational techniques for tensor decomposition and discuss
some quality metrics that can be used to tune the complexity of the factorized
representation. We subsequently apply tensor factorization to a temporal
network for which a ground truth is available for both the community structure
and the temporal activity patterns. The data we use describe the social
interactions of students in a school, the associations between students and
school classes, and the spatio-temporal trajectories of students over time. We
show that non-negative tensor factorization is capable of recovering the class
structure with high accuracy. In particular, the extracted tensor components
can be validated either as known school classes, or in terms of correlated
activity patterns, i.e., of spatial and temporal coincidences that are
determined by the known school activity schedule
Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications
Nonnegative matrix factorization (NMF) has become a workhorse for signal and
data analytics, triggered by its model parsimony and interpretability. Perhaps
a bit surprisingly, the understanding to its model identifiability---the major
reason behind the interpretability in many applications such as topic mining
and hyperspectral imaging---had been rather limited until recent years.
Beginning from the 2010s, the identifiability research of NMF has progressed
considerably: Many interesting and important results have been discovered by
the signal processing (SP) and machine learning (ML) communities. NMF
identifiability has a great impact on many aspects in practice, such as
ill-posed formulation avoidance and performance-guaranteed algorithm design. On
the other hand, there is no tutorial paper that introduces NMF from an
identifiability viewpoint. In this paper, we aim at filling this gap by
offering a comprehensive and deep tutorial on model identifiability of NMF as
well as the connections to algorithms and applications. This tutorial will help
researchers and graduate students grasp the essence and insights of NMF,
thereby avoiding typical `pitfalls' that are often times due to unidentifiable
NMF formulations. This paper will also help practitioners pick/design suitable
factorization tools for their own problems.Comment: accepted version, IEEE Signal Processing Magazine; supplementary
materials added. Some minor revisions implemente
- …