864 research outputs found
Modularity functions maximization with nonnegative relaxation facilitates community detection in networks
We show here that the problem of maximizing a family of quantitative
functions, encompassing both the modularity (Q-measure) and modularity density
(D-measure), for community detection can be uniformly understood as a
combinatoric optimization involving the trace of a matrix called modularity
Laplacian. Instead of using traditional spectral relaxation, we apply
additional nonnegative constraint into this graph clustering problem and design
efficient algorithms to optimize the new objective. With the explicit
nonnegative constraint, our solutions are very close to the ideal community
indicator matrix and can directly assign nodes into communities. The
near-orthogonal columns of the solution can be reformulated as the posterior
probability of corresponding node belonging to each community. Therefore, the
proposed method can be exploited to identify the fuzzy or overlapping
communities and thus facilitates the understanding of the intrinsic structure
of networks. Experimental results show that our new algorithm consistently,
sometimes significantly, outperforms the traditional spectral relaxation
approaches
Using Underapproximations for Sparse Nonnegative Matrix Factorization
Nonnegative Matrix Factorization consists in (approximately) factorizing a
nonnegative data matrix by the product of two low-rank nonnegative matrices. It
has been successfully applied as a data analysis technique in numerous domains,
e.g., text mining, image processing, microarray data analysis, collaborative
filtering, etc.
We introduce a novel approach to solve NMF problems, based on the use of an
underapproximation technique, and show its effectiveness to obtain sparse
solutions. This approach, based on Lagrangian relaxation, allows the resolution
of NMF problems in a recursive fashion. We also prove that the
underapproximation problem is NP-hard for any fixed factorization rank, using a
reduction of the maximum edge biclique problem in bipartite graphs.
We test two variants of our underapproximation approach on several standard
image datasets and show that they provide sparse part-based representations
with low reconstruction error. Our results are comparable and sometimes
superior to those obtained by two standard Sparse Nonnegative Matrix
Factorization techniques.Comment: Version 2 removed the section about convex reformulations, which was
not central to the development of our main results; added material to the
introduction; added a review of previous related work (section 2.3);
completely rewritten the last part (section 4) to provide extensive numerical
results supporting our claims. Accepted in J. of Pattern Recognitio
Exact Clustering of Weighted Graphs via Semidefinite Programming
As a model problem for clustering, we consider the densest k-disjoint-clique
problem of partitioning a weighted complete graph into k disjoint subgraphs
such that the sum of the densities of these subgraphs is maximized. We
establish that such subgraphs can be recovered from the solution of a
particular semidefinite relaxation with high probability if the input graph is
sampled from a distribution of clusterable graphs. Specifically, the
semidefinite relaxation is exact if the graph consists of k large disjoint
subgraphs, corresponding to clusters, with weight concentrated within these
subgraphs, plus a moderate number of outliers. Further, we establish that if
noise is weakly obscuring these clusters, i.e, the between-cluster edges are
assigned very small weights, then we can recover significantly smaller
clusters. For example, we show that in approximately sparse graphs, where the
between-cluster weights tend to zero as the size n of the graph tends to
infinity, we can recover clusters of size polylogarithmic in n. Empirical
evidence from numerical simulations is also provided to support these
theoretical phase transitions to perfect recovery of the cluster structure
Using underapproximations for sparse nonnegative matrix factorization
Nonnegative Matrix Factorization (NMF) has gathered a lot of attention in the last decade and has been successfully applied in numerous applications. It consists in the factorization of a nonnegative matrix by the product of two low-rank nonnegative matrices:. MªVW. In this paper, we attempt to solve NMF problems in a recursive way. In order to do that, we introduce a new variant called Nonnegative Matrix Underapproximation (NMU) by adding the upper bound constraint VW£M. Besides enabling a recursive procedure for NMF, these inequalities make NMU particularly well suited to achieve a sparse representation, improving the part-based decomposition. Although NMU is NP-hard (which we prove using its equivalence with the maximum edge biclique problem in bipartite graphs), we present two approaches to solve it: a method based on convex reformulations and a method based on Lagrangian relaxation. Finally, we provide some encouraging numerical results for image processing applications.nonnegative matrix factorization, underapproximation, maximum edge biclique problem, sparsity, image processing
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
Efficient Semidefinite Spectral Clustering via Lagrange Duality
We propose an efficient approach to semidefinite spectral clustering (SSC),
which addresses the Frobenius normalization with the positive semidefinite
(p.s.d.) constraint for spectral clustering. Compared with the original
Frobenius norm approximation based algorithm, the proposed algorithm can more
accurately find the closest doubly stochastic approximation to the affinity
matrix by considering the p.s.d. constraint. In this paper, SSC is formulated
as a semidefinite programming (SDP) problem. In order to solve the high
computational complexity of SDP, we present a dual algorithm based on the
Lagrange dual formalization. Two versions of the proposed algorithm are
proffered: one with less memory usage and the other with faster convergence
rate. The proposed algorithm has much lower time complexity than that of the
standard interior-point based SDP solvers. Experimental results on both UCI
data sets and real-world image data sets demonstrate that 1) compared with the
state-of-the-art spectral clustering methods, the proposed algorithm achieves
better clustering performance; and 2) our algorithm is much more efficient and
can solve larger-scale SSC problems than those standard interior-point SDP
solvers.Comment: 13 page
- …