95 research outputs found
Nonnegative factorization and the maximum edge biclique problem
Nonnegative matrix factorization (NMF) is a data analysis technique based on the approximation of a nonnegative matrix with a product of two nonnegative factors, which allows compression and interpretation of nonnegative data. In this paper, we study the case of rank-one factorization and show that when the matrix to be factored is not required to be nonnegative, the corresponding problem (R1NF) becomes NP-hard. This sheds new light on the complexity of NMF since any algorithm for fixed-rank NMF must be able to solve at least implicitly such rank-one subproblems. Our proof relies on a reduction of the maximum edge biclique problem to R1NF. We also link stationary points of R1NF to feasible solutions of the biclique problem, which allows us to design a new type of biclique finding algorithm based on the application of a block-coordinate descent scheme to R1NF. We show that this algorithm, whose algorithmic complexity per iteration is proportional to the number of edges in the graph, is guaranteed to converge to a biclique and that it performs competitively with existing methods on random graphs and text mining datasets.nonnegative matrix factorization, rank-one factorization, maximum edge biclique problem, algorithmic complexity, biclique finding algorithm
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
Algorithms approaching the threshold for semi-random planted clique
We design new polynomial-time algorithms for recovering planted cliques in
the semi-random graph model introduced by Feige and Kilian~\cite{FK01}. The
previous best algorithms for this model succeed if the planted clique has size
at least in a graph with vertices (Mehta, Mckenzie, Trevisan,
2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for
planted-clique sizes approaching -- the information-theoretic
threshold in the semi-random model~\cite{steinhardt2017does} and a conjectured
computational threshold even in the easier fully-random model. This result
comes close to resolving open questions by Feige and Steinhardt.
Our algorithms are based on higher constant degree sum-of-squares relaxation
and rely on a new conceptual connection that translates certificates of upper
bounds on biclique numbers in \emph{unbalanced} bipartite Erd\H{o}s--R\'enyi
random graphs into algorithms for semi-random planted clique. The use of a
higher-constant degree sum-of-squares is essential in our setting: we prove a
lower bound on the basic SDP for certifying bicliques that shows that the basic
SDP cannot succeed for planted cliques of size . We also provide
some evidence that the information-computation trade-off of our current
algorithms may be inherent by proving an average-case lower bound for
unbalanced bicliques in the low-degree-polynomials model.Comment: 51 pages, the arxiv landing page contains a shortened abstrac
On maximal chain subgraphs and covers of bipartite graphs
In this paper, we address three related problems. One is the enumeration of all the maximal edge induced chain subgraphs of a bipartite graph, for which we provide a polynomial delay algorithm. We give bounds on the number of maximal chain subgraphs for a bipartite graph and use them to establish the input-sensitive complexity of the enumeration problem.
The second problem we treat is the one of finding the minimum number of chain subgraphs needed to cover all the edges a bipartite graph. For this we provide an exact exponential algorithm with a non trivial complexity. Finally, we approach the problem of enumerating all minimal chain subgraph covers of a bipartite graph and show that it can be solved in quasi-polynomial time
- …