70 research outputs found
Towards effective exact methods for the Maximum Balanced Biclique Problem in bipartite graphs
The Maximum Balanced Biclique Problem (MBBP) is a prominent model with numerous applications. Yet, the problem is NP-hard and thus computationally challenging. We propose novel ideas for designing effective exact algorithms for MBBP in bipartite graphs. First, an Upper Bound Propagation (UBP) procedure to pre-compute an upper bound involving each vertex is introduced. Then we extend a simple Branch-and-Bound (B&B) algorithm by integrating the pre-computed upper bounds. Based on UBP, we also study a new integer linear programming model of MBBP which is more compact than an existing formulation (Dawande, Keskinocak, Swaminathan, & Tayur, 2001). We introduce new valid inequalities induced from the upper bounds to tighten these mathematical formulations for MBBP. Experiments with random bipartite graphs demonstrate the efficiency of the extended B&B algorithm and the valid inequalities generated on demand. Further tests with 30 real-life instances show that, for at least three very large graphs, the new approaches improve the computational time with four orders of magnitude compared to the original B&B
Quantum Algorithm for Maximum Biclique Problem
Identifying a biclique with the maximum number of edges bears considerable
implications for numerous fields of application, such as detecting anomalies in
E-commerce transactions, discerning protein-protein interactions in biology,
and refining the efficacy of social network recommendation algorithms. However,
the inherent NP-hardness of this problem significantly complicates the matter.
The prohibitive time complexity of existing algorithms is the primary
bottleneck constraining the application scenarios. Aiming to address this
challenge, we present an unprecedented exploration of a quantum computing
approach. Efficient quantum algorithms, as a crucial future direction for
handling NP-hard problems, are presently under intensive investigation, of
which the potential has already been proven in practical arenas such as
cybersecurity. However, in the field of quantum algorithms for graph databases,
little work has been done due to the challenges presented by the quantum
representation of complex graph topologies. In this study, we delve into the
intricacies of encoding a bipartite graph on a quantum computer. Given a
bipartite graph with n vertices, we propose a ground-breaking algorithm qMBS
with time complexity O^*(2^(n/2)), illustrating a quadratic speed-up in terms
of complexity compared to the state-of-the-art. Furthermore, we detail two
variants tailored for the maximum vertex biclique problem and the maximum
balanced biclique problem. To corroborate the practical performance and
efficacy of our proposed algorithms, we have conducted proof-of-principle
experiments utilizing IBM quantum simulators, of which the results provide a
substantial validation of our approach to the extent possible to date
Multipartite Graph Algorithms for the Analysis of Heterogeneous Data
The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility
Cohesive subgraph identification in large graphs
Graph data is ubiquitous in real world applications, as the relationship among entities in the applications can be naturally captured by the graph model. Finding cohesive subgraphs is a fundamental problem in graph mining with diverse applications. Given the important roles of cohesive subgraphs, this thesis focuses on cohesive subgraph identification in large graphs.
Firstly, we study the size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques.
Secondly, we formulate the notion of similar-biclique in bipartite graphs which is a special kind of biclique where all vertices from a designated side are similar to each other, and aim to enumerate all maximal similar-bicliques. We propose a backtracking algorithm MSBE to directly enumerate maximal similar-bicliques, and power it by vertex reduction and optimization techniques. In addition, we design a novel index structure to speed up a time-critical operation of MSBE, as well as to speed up vertex reduction. Efficient index construction algorithms are developed.
Thirdly, we consider balanced cliques in signed graphs --- a clique is balanced if its vertex set can be partitioned into CL and CR such that all negative edges are between CL and CR --- and study the problem of maximum balanced clique computation. We propose techniques to transform the maximum balanced clique problem over G to a series of maximum dichromatic clique problems over small subgraphs of G. The transformation not only removes edge signs but also sparsifies the edge set
Optimization-Based Network Analysis with Applications in Clustering and Data Mining
In this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research.
The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem.
Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems
Graph Sparsification, Spectral Sketches, and Faster Resistance Computation, via Short Cycle Decompositions
We develop a framework for graph sparsification and sketching, based on a new
tool, short cycle decomposition -- a decomposition of an unweighted graph into
an edge-disjoint collection of short cycles, plus few extra edges. A simple
observation gives that every graph G on n vertices with m edges can be
decomposed in time into cycles of length at most , and at most
extra edges. We give an time algorithm for constructing a
short cycle decomposition, with cycles of length , and
extra edges. These decompositions enable us to make progress on several open
questions:
* We give an algorithm to find -approximations to effective
resistances of all edges in time , improving over
the previous best of .
This gives an algorithm to approximate the determinant of a Laplacian up to
in time.
* We show existence and efficient algorithms for constructing graphical
spectral sketches -- a distribution over sparse graphs H such that for a fixed
vector , we have w.h.p. and
. This implies the existence of
resistance-sparsifiers with about edges that preserve the
effective resistances between every pair of vertices up to
* By combining short cycle decompositions with known tools in graph
sparsification, we show the existence of nearly-linear sized degree-preserving
spectral sparsifiers, as well as significantly sparser approximations of
directed graphs. The latter is critical to recent breakthroughs on faster
algorithms for solving linear systems in directed Laplacians.
Improved algorithms for constructing short cycle decompositions will lead to
improvements for each of the above results.Comment: 80 page
Bi-(N-) cluster editing and its biomedical applications
The extremely fast advances in wet-lab techniques lead to an exponential growth of heterogeneous and unstructured biological data, posing a great challenge to data integration in nowadays system biology. The traditional clustering approach, although widely used to divide the data into groups sharing common features, is less powerful in the analysis of heterogeneous data from n different sources (n _ 2). The co-clustering approach has been widely used for combined analyses of multiple networks to address the challenge of heterogeneity. In this thesis, novel methods for the co-clustering of large scale heterogeneous data sets are presented in the software package n-CluE: one exact algorithm and two heuristic algorithms based on the model of bi-/n-cluster editing by modeling the input as n-partite graphs and solving the clustering problem with various strategies. In the first part of the thesis, the complexity and the fixed-parameter tractability of the extended bicluster editing model with relaxed constraints are investigated, namely the ?-bicluster editing model and its NP-hardness is proven. Based on the results of this analysis, three strategies within the n-CluE software package are then established and discussed, together with the evaluations on performances and the systematic comparisons against other algorithms of the same type in solving bi-/n-cluster editing problem. To demonstrate the practical impact, three real-world analyses using n-CluE are performed, including (a) prediction of novel genotype-phenotype associations by clustering the data from Genome-Wide Association Studies; (b) comparison between n-CluE and eight other biclustering tools on GEO Omnibus microarray data sets; (c) drug repositioning predictions by co-clustering on drug, gene and disease networks. The outstanding performance of n-CluE in the real-world applications shows its strength and flexibility in integrating heterogeneous data and extracting biological relevant information in bioinformatic analyses.Die enormen Fortschritte im Bereich Labortechnik haben in jĂŒngster Zeit zu einer exponentiell wachsenden Menge an heterogenen und unstrukturierten Daten gefĂŒhrt. Dies stellt eine groĂe Herausforderung fĂŒr systembiologische Forschung dar, innerhalb derer diese Datenmengen durch Datenintegration und Datamining zusammengefasst und in Kombination analysiert werden. Traditionelles Clustering ist eine vielseitig eingesetzte Methode, um EntitĂ€ten innerhalb grosser Datenmengen bezĂŒglich ihrer Ăhnlichkeit bestimmter Attribute zu gruppieren (âclusternâ). Beim Clustern von heterogenen Daten aus n (n > 2) unterschiedlichen Quellen zeigen traditionelle Clusteringmethoden jedoch SchwĂ€chen. In solchen FĂ€llen bieten Co-clusteringmethoden dadurch Vorteile, dass sie DatensĂ€tze gleichzeitig partitionieren können. In dieser Dissertation stelle ich neue Clusteringmethoden vor, die in der Software n-CluE zusammengefĂŒhrt sind. Diese neuen Methoden wurden aus dem bi-/n-cluster editing heraus entwickelt und lösen durch Transformation der EingangsdatensĂ€tze in n-partite Graphen mit verschiedenen Strategien das zugrundeliegende Clusteringproblem. Diese Dissertation ist in zwei verschiedene Teile gegliedert. Der erste Teil befasst sich eingehend mit der KomplexitĂ€tanalyse verschiedener erweiterter bicluster editing Modelle, die sog. ?-bicluster editing Modelle und es wird der Beweis der NP-Schwere erbracht. Basierend auf diesen theoretischen Gesichtspunkten prĂ€sentiere ich im zweiten Teil drei unterschiedliche Algorithmen, einen exakten Algorithmus und zwei Heuristiken und demonstriere ihre LeistungsfĂ€higkeit und Robustheit im Vergleich mit anderen algorithmischen Herangehensweisen. Die StĂ€rken von n-CluE werden anhand von drei realen Anwendungsbeispielen untermauert: (a) Die Vorhersage neuartiger Genotyp-PhĂ€notyp-Assoziationen durch Biclustering-Analyse von Daten aus genomweiten Assoziationsstudien (GWAS);(b) Der Vergleich zwischen n-CluE und acht weiteren Softwarepaketen anhand von Bicluster-Analysen von Microarraydaten aus den Gene Expression Omnibus (GEO); (c) Die Vorhersage von Medikamenten-Repositionierung durch integrierte Analyse von Medikamenten-, Gen- und Krankeitsnetzwerken. Die Resultate zeigen eindrucksvoll die StĂ€rken der n-CluE Software. Das Ergebnis ist eine leistungsstarke, robuste und flexibel erweiterbare Implementierung des Biclustering-Theorems zur Integration grosser heterogener Datenmengen fĂŒr das Extrahieren biologisch relevanter Ergebnisse im Rahmen von bioinformatischen Studien
Proceedings of the 8th Cologne-Twente Workshop on Graphs and Combinatorial Optimization
International audienceThe Cologne-Twente Workshop (CTW) on Graphs and Combinatorial Optimization started off as a series of workshops organized bi-annually by either Köln University or Twente University. As its importance grew over time, it re-centered its geographical focus by including northern Italy (CTW04 in Menaggio, on the lake Como and CTW08 in Gargnano, on the Garda lake). This year, CTW (in its eighth edition) will be staged in France for the first time: more precisely in the heart of Paris, at the Conservatoire National dâArts et MĂ©tiers (CNAM), between 2nd and 4th June 2009, by a mixed organizing committee with members from LIX, Ecole Polytechnique and CEDRIC, CNAM
- âŠ