    Semidefinite programming and eigenvalue bounds for the graph partition problem

    The graph partition problem is the problem of partitioning the vertex set of a graph into a fixed number of sets of given sizes such that the sum of weights of edges joining different sets is optimized. In this paper we simplify a known matrix-lifting semidefinite programming relaxation of the graph partition problem for several classes of graphs and also show how to aggregate additional triangle and independent set constraints for graphs with symmetry. We present an eigenvalue bound for the graph partition problem of a strongly regular graph, extending a similar result for the equipartition problem. We also derive a linear programming bound of the graph partition problem for certain Johnson and Kneser graphs. Using what we call the Laplacian algebra of a graph, we derive an eigenvalue bound for the graph partition problem that is the first known closed form bound that is applicable to any graph, thereby extending a well-known result in spectral graph theory. Finally, we strengthen a known semidefinite programming relaxation of a specific quadratic assignment problem and the above-mentioned matrix-lifting semidefinite programming relaxation by adding two constraints that correspond to assigning two vertices of the graph to different parts of the partition. This strengthening performs well on highly symmetric graphs when other relaxations provide weak or trivial bounds

    Orbitopal Fixing

    The topic of this paper are integer programming models in which a subset of 0/1-variables encode a partitioning of a set of objects into disjoint subsets. Such models can be surprisingly hard to solve by branch-and-cut algorithms if the order of the subsets of the partition is irrelevant, since this kind of symmetry unnecessarily blows up the search tree. We present a general tool, called orbitopal fixing, for enhancing the capabilities of branch-and-cut algorithms in solving such symmetric integer programming models. We devise a linear time algorithm that, applied at each node of the search tree, removes redundant parts of the tree produced by the above mentioned symmetry. The method relies on certain polyhedra, called orbitopes, which have been introduced bei Kaibel and Pfetsch (Math. Programm. A, 114 (2008), 1-36). It does, however, not explicitly add inequalities to the model. Instead, it uses certain fixing rules for variables. We demonstrate the computational power of orbitopal fixing at the example of a graph partitioning problem.Comment: 22 pages, revised and extended version of a previous version that has appeared under the same title in Proc. IPCO 200

    Partitioning sparse rectangular matrices for parallel processing

    Spectral partitioning with multiple eigenvectors

    AbstractThe graph partitioning problem is to divide the vertices of a graph into disjoint clusters to minimize the total cost of the edges cut by the clusters. A spectral partitioning heuristic uses the graph's eigenvectors to construct a geometric representation of the graph (e.g., linear orderings) which are subsequently partitioned. Our main result shows that when all the eigenvectors are used, graph partitioning reduces to a new vector partitioning problem. This result implies that as many eigenvectors as are practically possible should be used to construct a solution. This philosophy is in contrast to that of the widely used spectral bipartitioning (SB) heuristic (which uses only a single eigenvector) and several previous multi-way partitioning heuristics [8, 11, 17, 27, 38] (which use k eigenvectors to construct k-way partitionings). Our result motivates a simple ordering heuristic that is a multiple-eigenvector extension of SB. This heuristic not only significantly outperforms recursive SB, but can also yield excellent multi-way VLSI circuit partitionings as compared to [1, 11]. Our experiments suggest that the vector partitioning perspective opens the door to new and effective partitioning heuristics. The present paper updates and improves a preliminary version of this work [5]

    Multiple-Genome Annotation of Genome Fragments Using Hidden Markov Model Profiles

    To learn more about microbes and overcome the limitations of standard cultured methods, microbial communities are being studied in an uncultured state. In such metagenomic studies, genetic material is sampled from the environment and sequenced using the whole-genome shotgun sequencing technique. This results in thousands of DNA fragments that need to be identified, so that the composition and inner workings of the microbial community can begin to be understood. Those fragments are then assembled into longer portions of sequences. However the high diversity present in an environment and the often low level of genome coverage achieved by the sequencing technology result in a low number of assembled fragments (contigs) and many unassembled fragments (singletons). The identification of contigs and singletons is usually done using BLAST, which finds sequences similar to the contigs and singletons in a database. An expert may then manually read these results and determine if the function and taxonomic origins of each fragment can be determined. In this report, an automated system called Anacle is developed to annotate, following a taxonomy, the unassembled fragments before the assembly process. Knowledge of what proteins can be found in each taxon is built into Anacle by clustering all known proteins of that taxon. The annotation performances from using Markov clustering (MCL) and Self- Organizing Maps (SOM) are investigated and compared. The resulting protein clusters can each be represented by a Hidden Markov Model (HMM) profile. Thus a “skeleton” of the taxon is generated with the profile HMMs providing a summary of the taxon’s genetic content. The experiments show that (1) MCL is superior to SOMs in annotation and in running time performance, (2) Anacle achieves good performance in taxonomic annotation, and (3) Anacle has the ability to generalize since it can correctly annotate fragments from genomes not present in the training dataset. These results indicate that Anacle can be very useful to metagenomics projects

    A Hypergraph Framework for Optimal Model-Based Decomposition of Design Problems

    Decomposition of large engineering system models is desirable sinceincreased model size reduces reliability and speed of numericalsolution algorithms. The article presents a methodology for optimalmodel-based decomposition (OMBD) of design problems, whether or notinitially cast as optimization problems. The overall model isrepresented by a hypergraph and is optimally partitioned into weaklyconnected subgraphs that satisfy decomposition constraints. Spectralgraph-partitioning methods together with iterative improvementtechniques are proposed for hypergraph partitioning. A known spectralK-partitioning formulation, which accounts for partition sizes andedge weights, is extended to graphs with also vertex weights. TheOMBD formulation is robust enough to account for computationaldemands and resources and strength of interdependencies between thecomputational modules contained in the model.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44780/1/10589_2004_Article_136837.pd

    Graph Clustering by Flow Simulation

    Eigenvalue, Quadratic Programming and Semidefinite Programming Bounds for Graph Partitioning Problems

    The Graph Partitioning problems are hard combinatorial optimization problems. We are interested in both lower bounds and upper bounds. We introduce several methods including basic eigenvalue and projected eigenvalue techniques, convex quadratic programming techniques, and semidefinite programming (SDP). In particular, we show that the SDP relaxation is equivalent to and arises from the Lagrangian relaxation for a particular quadratically constrained quadratic model. Moreover, the bounds obtained by the eigenvalue techniques are good and cheap

    Graph clustering by flow simulation

    Dit proefschrift heeft als onderwerp het clusteren van grafen door middel van simulatie van stroming, een probleem dat in zijn algemeenheid behoort tot het gebied der clusteranalyse. In deze tak van wetenschap ontwerpt en onderzoekt men methoden die gegeven bepaalde data een onderverdeling in groepen genereren, waarbij het oogmerk is een onderverdeling in groepen te vinden die natuurlijk is. Dat wil zeggen dat verschillende data-elementen in dezelfde groep idealiter veel op elkaar lijken, en dat data-elementen uit verschillende groepen idealiter veel van elkaar verschillen. Soms ontbreken zulke groepjes helemaal; dan is er weinig patroon te herkennen in de data. Het idee is dat de aanwezigheid van natuurlijke groepjes het mogelijk maakt de data te categoriseren. Een voorbeeld is het clusteren van gegevens (over symptomen of lichaamskarakteristieken) van patienten die aan dezelfde ziekte lijden. Als er duidelijke groepjes bestaan in die gegevens, kan dit tot extra inzicht leiden in de ziekte. Clusteranalyse kan aldus gebruikt worden voor exploratief onderzoek. Verdere voorbeelden komen uit de scheikunde, taxonomie, psychiatrie, archeologie, marktonderzoek en nog vele andere disicplines. Taxonomie, de studie van de classificatie van organismen, heeft een rijke geschiedenis beginnend bij Aristoteles en culminerend in de werken van Linnaeus. In feite kan de clusteranalyse gezien worden als het resultaat van een steeds meer systematische en abstracte studie van de diverse methoden ontworpen in verschillende toepassingsgebieden, waarbij methode zowel wordt gescheiden van data en toepassingsgebied als van berekeningswijze. In de cluster analyse kunnen grofweg twee richtingen onderscheiden worden, naar gelang het type data dat geclassificeerd moet worden. De data-elementen in het voorbeeld hierboven worden beschreven door vectoren (lijstjes van scores of metingen), en het verschil tussen twee elementen wordt bepaald door het verschil van de vectoren. Deze dissertatie betreft cluster analyse toegepast op data van het type `graaf'. Voorbeelden komen uit de patroonherkenning, het computer ondersteund ontwerpen, databases voorzien van hyperlinks en het World Wide Web. In al deze gevallen is er sprake van `punten' die verbonden zijn of niet. Een stelsel van punten samen met hun verbindingen heet een graaf. Een goede clustering van een graaf deelt de punten op in groepjes zodanig dat er weinig verbindingen lopen tussen (punten uit) verschillende groepjes en er veel verbindingen zijn in elk groepje afzonderlijk