33 research outputs found

    On the complexity of the clustering minimum Biclique completion problem

    Get PDF

    Multipartite Graph Algorithms for the Analysis of Heterogeneous Data

    Get PDF
    The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility

    Binary matrix factorisation and completion via integer programming

    Get PDF
    Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n × m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n × k and k × m respectively, which minimise the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size IPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations

    Binary Matrix Factorisation via Column Generation

    Full text link
    Identifying discrete patterns in binary data is an important dimensionality reduction tool in machine learning and data mining. In this paper, we consider the problem of low-rank binary matrix factorisation (BMF) under Boolean arithmetic. Due to the NP-hardness of this problem, most previous attempts rely on heuristic techniques. We formulate the problem as a mixed integer linear program and use a large scale optimisation technique of column generation to solve it without the need of heuristic pattern mining. Our approach focuses on accuracy and on the provision of optimality guarantees. Experimental results on real world datasets demonstrate that our proposed method is effective at producing highly accurate factorisations and improves on the previously available best known results for 15 out of 24 problem instances

    ЗАДАЧА МИНИМАЛЬНОГО ПОПОЛНЕНИЯ ДВУДОЛЬНОГО ГРАФА

    Get PDF
    In this article we show that the clustering minimum biclique completion problem is NP-complete in the class of P4-free bipartite graphs. We have also proposed a dynamic programming algorithm for that problem restricted to 2K2-free bipartite graphs.Рассматривается графовая задача, в которой задан двудольный граф с выделенной долей и требуется добавить в граф наименьшее число дополнительных ребер так, что множество вершин выделенной доли получившегося графа можно разбить на заданное число непустых множеств, каждое из которых содержит только вершины с одинаковыми окружениями. В работе установлено, что задача является NP-трудной в классе P4-свободных двудольных графов и предлагается алгоритм, который решает задачу в классе 2K2-свободных двудольных графов

    Crew Planning at Netherlands Railways: Improving Fairness, Attractiveness, and Efficiency

    Get PDF
    The development and improvement of decision support voor crew planning at Netherlands Railways (NS

    Proceedings of the 8th Cologne-Twente Workshop on Graphs and Combinatorial Optimization

    No full text
    International audienceThe Cologne-Twente Workshop (CTW) on Graphs and Combinatorial Optimization started off as a series of workshops organized bi-annually by either Köln University or Twente University. As its importance grew over time, it re-centered its geographical focus by including northern Italy (CTW04 in Menaggio, on the lake Como and CTW08 in Gargnano, on the Garda lake). This year, CTW (in its eighth edition) will be staged in France for the first time: more precisely in the heart of Paris, at the Conservatoire National d’Arts et Métiers (CNAM), between 2nd and 4th June 2009, by a mixed organizing committee with members from LIX, Ecole Polytechnique and CEDRIC, CNAM

    Binary matrix factorisations under Boolean arithmetic

    Get PDF
    For a binary matrix X, the Boolean rank br(X) is the smallest integer for which X can be factorised into the Boolean matrix product of two binary matrices A and B with inner dimension br(X). The isolation number i(X) of X is the maximum number of 1s no two of which are in a same row, column or a 2 x 2 submatrix of all 1s. In Part I. of this thesis, we continue Anna Lubiw's study of firm matrices. X is said to be firm if i(X)=br(X) and this equality holds for all its submatrices. We show that the stronger concept of superfirmness of X is equivalent to having no odd holes in the rectangle cover graph of X, the graph in which br(X) and i(X) translate to the clique cover number and the independence number, respectively. A binary matrix is minimally non-firm if it is not firm but all of its proper submatrices are. We introduce a matrix operation that leads to generalised binary matrices and, under some conditions, preserves firmness and superfirmness. Then we use this matrix operation to derive several infinite families of minimally non-firm matrices. To the best of our knowledge, minimally non-firm matrices have not been studied before and our constructions provide the first infinite families of them. In Part II. of this thesis, we explore rank-k binary matrix factorisation (k-BMF). In k-BMF, we are given an m x n binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension m x k and k x n respectively, which minimise the distance between X and the Boolean matrix product of A and B in the squared Frobenius norm. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size IPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times a zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations

    On star and biclique edge-colorings

    Get PDF
    A biclique of G is a maximal set of vertices that induces a complete bipartite subgraph Kp,q of G with at least one edge, and a star of a graph G is a maximal set of vertices that induces a complete bipartite graph K1,q. A biclique (resp. star) edge-coloring is a coloring of the edges of a graph with no monochromatic bicliques (resp. stars). We prove that the problem of determining whether a graph G has a biclique (resp. star) edgecoloring using two colors is NP-hard. Furthermore, we describe polynomial time algorithms for the problem in restricted classes: K3-free graphs, chordal bipartite graphs, powers of paths, and powers of cycles

    Integrality and cutting planes in semidefinite programming approaches for combinatorial optimization

    Get PDF
    Many real-life decision problems are discrete in nature. To solve such problems as mathematical optimization problems, integrality constraints are commonly incorporated in the model to reflect the choice of finitely many alternatives. At the same time, it is known that semidefinite programming is very suitable for obtaining strong relaxations of combinatorial optimization problems. In this dissertation, we study the interplay between semidefinite programming and integrality, where a special focus is put on the use of cutting-plane methods. Although the notions of integrality and cutting planes are well-studied in linear programming, integer semidefinite programs (ISDPs) are considered only recently. We show that manycombinatorial optimization problems can be modeled as ISDPs. Several theoretical concepts, such as the Chvátal-Gomory closure, total dual integrality and integer Lagrangian duality, are studied for the case of integer semidefinite programming. On the practical side, we introduce an improved branch-and-cut approach for ISDPs and a cutting-plane augmented Lagrangian method for solving semidefinite programs with a large number of cutting planes. Throughout the thesis, we apply our results to a wide range of combinatorial optimization problems, among which the quadratic cycle cover problem, the quadratic traveling salesman problem and the graph partition problem. Our approaches lead to novel, strong and efficient solution strategies for these problems, with the potential to be extended to other problem classes
    corecore