70 research outputs found

    Towards effective exact methods for the Maximum Balanced Biclique Problem in bipartite graphs

    Get PDF
    The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP in bipartite graphs. First, an Upper Bound Propagation (UBP) procedure to pre-compute an upper bound involving each vertex is introduced. Then we extend a simple Branch-and-Bound (B&B) algorithm by integrating the pre-computed upper bounds. Based on UBP, we also study a new integer linear programming model of MBBP which is more compact than an existing formulation (Dawande, Keskinocak, Swaminathan, & Tayur, 2001). We introduce new valid inequalities induced from the upper bounds to tighten these mathematical formulations for MBBP. Experiments with random bipartite graphs demonstrate the efficiency of the extended B&B algorithm and the valid inequalities generated on demand. Further tests with 30 real-life instances show that, for at least three very large graphs, the new approaches improve the computational time with four orders of magnitude compared to the original B&B

    Quantum Algorithm for Maximum Biclique Problem

    Full text link
    Identifying a biclique with the maximum number of edges bears considerable implications for numerous fields of application, such as detecting anomalies in E-commerce transactions, discerning protein-protein interactions in biology, and refining the efficacy of social network recommendation algorithms. However, the inherent NP-hardness of this problem significantly complicates the matter. The prohibitive time complexity of existing algorithms is the primary bottleneck constraining the application scenarios. Aiming to address this challenge, we present an unprecedented exploration of a quantum computing approach. Efficient quantum algorithms, as a crucial future direction for handling NP-hard problems, are presently under intensive investigation, of which the potential has already been proven in practical arenas such as cybersecurity. However, in the field of quantum algorithms for graph databases, little work has been done due to the challenges presented by the quantum representation of complex graph topologies. In this study, we delve into the intricacies of encoding a bipartite graph on a quantum computer. Given a bipartite graph with n vertices, we propose a ground-breaking algorithm qMBS with time complexity O^*(2^(n/2)), illustrating a quadratic speed-up in terms of complexity compared to the state-of-the-art. Furthermore, we detail two variants tailored for the maximum vertex biclique problem and the maximum balanced biclique problem. To corroborate the practical performance and efficacy of our proposed algorithms, we have conducted proof-of-principle experiments utilizing IBM quantum simulators, of which the results provide a substantial validation of our approach to the extent possible to date

    Multipartite Graph Algorithms for the Analysis of Heterogeneous Data

    Get PDF
    The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility

    Cohesive subgraph identification in large graphs

    Get PDF
    Graph data is ubiquitous in real world applications, as the relationship among entities in the applications can be naturally captured by the graph model. Finding cohesive subgraphs is a fundamental problem in graph mining with diverse applications. Given the important roles of cohesive subgraphs, this thesis focuses on cohesive subgraph identification in large graphs. Firstly, we study the size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques. Secondly, we formulate the notion of similar-biclique in bipartite graphs which is a special kind of biclique where all vertices from a designated side are similar to each other, and aim to enumerate all maximal similar-bicliques. We propose a backtracking algorithm MSBE to directly enumerate maximal similar-bicliques, and power it by vertex reduction and optimization techniques. In addition, we design a novel index structure to speed up a time-critical operation of MSBE, as well as to speed up vertex reduction. Efficient index construction algorithms are developed. Thirdly, we consider balanced cliques in signed graphs --- a clique is balanced if its vertex set can be partitioned into CL and CR such that all negative edges are between CL and CR --- and study the problem of maximum balanced clique computation. We propose techniques to transform the maximum balanced clique problem over G to a series of maximum dichromatic clique problems over small subgraphs of G. The transformation not only removes edge signs but also sparsifies the edge set

    Optimization-Based Network Analysis with Applications in Clustering and Data Mining

    Get PDF
    In this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research. The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem. Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems

    Graph Sparsification, Spectral Sketches, and Faster Resistance Computation, via Short Cycle Decompositions

    Get PDF
    We develop a framework for graph sparsification and sketching, based on a new tool, short cycle decomposition -- a decomposition of an unweighted graph into an edge-disjoint collection of short cycles, plus few extra edges. A simple observation gives that every graph G on n vertices with m edges can be decomposed in O(mn)O(mn) time into cycles of length at most 2log⁥n2\log n, and at most 2n2n extra edges. We give an m1+o(1)m^{1+o(1)} time algorithm for constructing a short cycle decomposition, with cycles of length no(1)n^{o(1)}, and n1+o(1)n^{1+o(1)} extra edges. These decompositions enable us to make progress on several open questions: * We give an algorithm to find (1±ϔ)(1\pm\epsilon)-approximations to effective resistances of all edges in time m1+o(1)ϔ−1.5m^{1+o(1)}\epsilon^{-1.5}, improving over the previous best of O~(min⁥{mϔ−2,n2ϔ−1})\tilde{O}(\min\{m\epsilon^{-2},n^2 \epsilon^{-1}\}). This gives an algorithm to approximate the determinant of a Laplacian up to (1±ϔ)(1\pm\epsilon) in m1+o(1)+n15/8+o(1)ϔ−7/4m^{1 + o(1)} + n^{15/8+o(1)}\epsilon^{-7/4} time. * We show existence and efficient algorithms for constructing graphical spectral sketches -- a distribution over sparse graphs H such that for a fixed vector xx, we have w.h.p. xâ€ČLHx=(1±ϔ)xâ€ČLGxx'L_Hx=(1\pm\epsilon)x'L_Gx and xâ€ČLH+x=(1±ϔ)xâ€ČLG+xx'L_H^+x=(1\pm\epsilon)x'L_G^+x. This implies the existence of resistance-sparsifiers with about nϔ−1n\epsilon^{-1} edges that preserve the effective resistances between every pair of vertices up to (1±ϔ).(1\pm\epsilon). * By combining short cycle decompositions with known tools in graph sparsification, we show the existence of nearly-linear sized degree-preserving spectral sparsifiers, as well as significantly sparser approximations of directed graphs. The latter is critical to recent breakthroughs on faster algorithms for solving linear systems in directed Laplacians. Improved algorithms for constructing short cycle decompositions will lead to improvements for each of the above results.Comment: 80 page

    Bi-(N-) cluster editing and its biomedical applications

    Get PDF
    The extremely fast advances in wet-lab techniques lead to an exponential growth of heterogeneous and unstructured biological data, posing a great challenge to data integration in nowadays system biology. The traditional clustering approach, although widely used to divide the data into groups sharing common features, is less powerful in the analysis of heterogeneous data from n different sources (n _ 2). The co-clustering approach has been widely used for combined analyses of multiple networks to address the challenge of heterogeneity. In this thesis, novel methods for the co-clustering of large scale heterogeneous data sets are presented in the software package n-CluE: one exact algorithm and two heuristic algorithms based on the model of bi-/n-cluster editing by modeling the input as n-partite graphs and solving the clustering problem with various strategies. In the first part of the thesis, the complexity and the fixed-parameter tractability of the extended bicluster editing model with relaxed constraints are investigated, namely the ?-bicluster editing model and its NP-hardness is proven. Based on the results of this analysis, three strategies within the n-CluE software package are then established and discussed, together with the evaluations on performances and the systematic comparisons against other algorithms of the same type in solving bi-/n-cluster editing problem. To demonstrate the practical impact, three real-world analyses using n-CluE are performed, including (a) prediction of novel genotype-phenotype associations by clustering the data from Genome-Wide Association Studies; (b) comparison between n-CluE and eight other biclustering tools on GEO Omnibus microarray data sets; (c) drug repositioning predictions by co-clustering on drug, gene and disease networks. The outstanding performance of n-CluE in the real-world applications shows its strength and flexibility in integrating heterogeneous data and extracting biological relevant information in bioinformatic analyses.Die enormen Fortschritte im Bereich Labortechnik haben in jĂŒngster Zeit zu einer exponentiell wachsenden Menge an heterogenen und unstrukturierten Daten gefĂŒhrt. Dies stellt eine große Herausforderung fĂŒr systembiologische Forschung dar, innerhalb derer diese Datenmengen durch Datenintegration und Datamining zusammengefasst und in Kombination analysiert werden. Traditionelles Clustering ist eine vielseitig eingesetzte Methode, um EntitĂ€ten innerhalb grosser Datenmengen bezĂŒglich ihrer Ähnlichkeit bestimmter Attribute zu gruppieren (“clustern„). Beim Clustern von heterogenen Daten aus n (n > 2) unterschiedlichen Quellen zeigen traditionelle Clusteringmethoden jedoch SchwĂ€chen. In solchen FĂ€llen bieten Co-clusteringmethoden dadurch Vorteile, dass sie DatensĂ€tze gleichzeitig partitionieren können. In dieser Dissertation stelle ich neue Clusteringmethoden vor, die in der Software n-CluE zusammengefĂŒhrt sind. Diese neuen Methoden wurden aus dem bi-/n-cluster editing heraus entwickelt und lösen durch Transformation der EingangsdatensĂ€tze in n-partite Graphen mit verschiedenen Strategien das zugrundeliegende Clusteringproblem. Diese Dissertation ist in zwei verschiedene Teile gegliedert. Der erste Teil befasst sich eingehend mit der KomplexitĂ€tanalyse verschiedener erweiterter bicluster editing Modelle, die sog. ?-bicluster editing Modelle und es wird der Beweis der NP-Schwere erbracht. Basierend auf diesen theoretischen Gesichtspunkten prĂ€sentiere ich im zweiten Teil drei unterschiedliche Algorithmen, einen exakten Algorithmus und zwei Heuristiken und demonstriere ihre LeistungsfĂ€higkeit und Robustheit im Vergleich mit anderen algorithmischen Herangehensweisen. Die StĂ€rken von n-CluE werden anhand von drei realen Anwendungsbeispielen untermauert: (a) Die Vorhersage neuartiger Genotyp-PhĂ€notyp-Assoziationen durch Biclustering-Analyse von Daten aus genomweiten Assoziationsstudien (GWAS);(b) Der Vergleich zwischen n-CluE und acht weiteren Softwarepaketen anhand von Bicluster-Analysen von Microarraydaten aus den Gene Expression Omnibus (GEO); (c) Die Vorhersage von Medikamenten-Repositionierung durch integrierte Analyse von Medikamenten-, Gen- und Krankeitsnetzwerken. Die Resultate zeigen eindrucksvoll die StĂ€rken der n-CluE Software. Das Ergebnis ist eine leistungsstarke, robuste und flexibel erweiterbare Implementierung des Biclustering-Theorems zur Integration grosser heterogener Datenmengen fĂŒr das Extrahieren biologisch relevanter Ergebnisse im Rahmen von bioinformatischen Studien

    Proceedings of the 8th Cologne-Twente Workshop on Graphs and Combinatorial Optimization

    No full text
    International audienceThe Cologne-Twente Workshop (CTW) on Graphs and Combinatorial Optimization started off as a series of workshops organized bi-annually by either Köln University or Twente University. As its importance grew over time, it re-centered its geographical focus by including northern Italy (CTW04 in Menaggio, on the lake Como and CTW08 in Gargnano, on the Garda lake). This year, CTW (in its eighth edition) will be staged in France for the first time: more precisely in the heart of Paris, at the Conservatoire National d’Arts et MĂ©tiers (CNAM), between 2nd and 4th June 2009, by a mixed organizing committee with members from LIX, Ecole Polytechnique and CEDRIC, CNAM
    • 

    corecore