95 research outputs found

    Nonnegative factorization and the maximum edge biclique problem

    Get PDF
    Nonnegative matrix factorization (NMF) is a data analysis technique based on the approximation of a nonnegative matrix with a product of two nonnegative factors, which allows compression and interpretation of nonnegative data. In this paper, we study the case of rank-one factorization and show that when the matrix to be factored is not required to be nonnegative, the corresponding problem (R1NF) becomes NP-hard. This sheds new light on the complexity of NMF since any algorithm for fixed-rank NMF must be able to solve at least implicitly such rank-one subproblems. Our proof relies on a reduction of the maximum edge biclique problem to R1NF. We also link stationary points of R1NF to feasible solutions of the biclique problem, which allows us to design a new type of biclique finding algorithm based on the application of a block-coordinate descent scheme to R1NF. We show that this algorithm, whose algorithmic complexity per iteration is proportional to the number of edges in the graph, is guaranteed to converge to a biclique and that it performs competitively with existing methods on random graphs and text mining datasets.nonnegative matrix factorization, rank-one factorization, maximum edge biclique problem, algorithmic complexity, biclique finding algorithm

    Guaranteed clustering and biclustering via semidefinite programming

    Get PDF
    Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k-disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation for the biclustering problem with similar recovery guarantees. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k-disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided

    Algorithms approaching the threshold for semi-random planted clique

    Full text link
    We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian~\cite{FK01}. The previous best algorithms for this model succeed if the planted clique has size at least n2/3n^{2/3} in a graph with nn vertices (Mehta, Mckenzie, Trevisan, 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching n1/2n^{1/2} -- the information-theoretic threshold in the semi-random model~\cite{steinhardt2017does} and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige and Steinhardt. Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in \emph{unbalanced} bipartite Erd\H{o}s--R\'enyi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size k=o(n2/3)k =o(n^{2/3}). We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model.Comment: 51 pages, the arxiv landing page contains a shortened abstrac

    On maximal chain subgraphs and covers of bipartite graphs

    Get PDF
    In this paper, we address three related problems. One is the enumeration of all the maximal edge induced chain subgraphs of a bipartite graph, for which we provide a polynomial delay algorithm. We give bounds on the number of maximal chain subgraphs for a bipartite graph and use them to establish the input-sensitive complexity of the enumeration problem. The second problem we treat is the one of finding the minimum number of chain subgraphs needed to cover all the edges a bipartite graph. For this we provide an exact exponential algorithm with a non trivial complexity. Finally, we approach the problem of enumerating all minimal chain subgraph covers of a bipartite graph and show that it can be solved in quasi-polynomial time
    corecore