108 research outputs found

    Analysis of an iterated greedy heuristic for vertex clique covering

    Get PDF
    The aim of the vertex clique covering problem (CCP) is to cover the vertices of a graph with as few cliques as possible. We analyse the iterated greedy (IG) algorithm for CCP, which was previously shown to provide strong empirical results for real-world networks. It is demonstrated how the techniques of analysis for randomised search heuristics can be applied to IG, and several practically relevant results are obtained. We show that for triangle-free graphs, IG solves CCP optimally in expected polynomial time. Secondly, we show that IG finds the optimum for CCP in a specific case of sparse random graphs in expected polynomial time with high probability. For Baraba´si-Albert model of scale-free networks, which is a canonical model explaining the growth of social, biological or computer networks, we obtain that IG obtains an asymptotically optimal approximation in polynomial time in expectation. Last but not least, we propose a slightly modified variant of IG, which guarantees expected polynomial-time convergence to the optimum for graphs with non-overlapping triangles

    Analysis of Iterated Greedy Heuristic for Vertex Clique Covering

    Get PDF
    The aim of the vertex clique covering problem (CCP) is to cover the vertices of a graph with as few cliques as possible. We analyse the iterated greedy (IG) algorithm for CCP, which was previously shown to provide strong empirical results for real-world networks. It is demonstrated how the techniques of analysis for randomised search heuristics can be applied to IG, and several practically relevant results are obtained. We show that for triangle-free graphs, IG solves CCP optimally in expected polynomial time. Secondly, we show that IG finds the optimum for CCP in a specific case of sparse random graphs in expected polynomial time with high probability. For Barabási-Albert model of scale-free networks, which is a canonical model explaining the growth of social, biological or computer networks, we obtain that IG obtains an asymptotically optimal approximation in polynomial time in expectation. Last but not least, we propose a slightly modified variant of IG, which guarantees expected polynomial-time convergence to the optimum for graphs with non-overlapping triangles

    Stochastic realization theory for exact and approximate multiscale models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. [245]-252).The thesis provides a detailed analysis of the independence structure possessed by multiscale models and demonstrates that such an analysis provides important insight into the multiscale stochastic realization problem. Multiscale models constitute a broad class of probabilistic models which includes the well--known subclass of multiscale autoregressive (MAR) models. MAR models have proven useful in a variety of different application areas, due to the fact that they provide a rich set of tools for various signal processing tasks. In order to use these tools, however, a MAR or multiscale model must first be constructed to provide an accurate probabilistic description of the particular application at hand. This thesis addresses this issue of multiscale model identification or realization. Previous work in the area of MAR model identification has focused on developing algorithms which decorrelate certain subsets of random vectors in an effort to design an accurate model. In this thesis, we develop a set-theoretic and graph-theoretic framework for better understanding these types of realization algorithms and for the purpose of designing new such algorithms.(cont.) The benefit of the framework developed here is that it separates the realization problem into two understandable parts - a dichotomy which helps to clarify the relationship between the exact realization problem, where a multiscale model is designed to exactly satisfy a probabilistic constraint, and the approximate realization problem, where the constraint is only approximately satisfied. The first part of our study focuses on developing a better understanding of the independence structure exhibited by multiscale models. As a result of this study, we are able to suggest a number of different sequential procedures for realizing exact multiscale models. The second part of our study focuses on approximate realization, where we define a relaxed version of the exact multiscale realization problem. We show that many of the ideas developed for the exact realization problem may be used to better understand the approximate realization problem and to develop algorithms for solving it. In particular, we propose an iterative procedure for solving the approximate realization problem, and we show that the parameterized version of this procedure is equivalent to the well-known EM algorithm. Finally, a specific algorithm is developed for realizing a multiscale model which matches the statistics of a Gaussian random process.by Dewey S. Tucker.Ph.D

    Graph theoretic generalizations of clique: optimization and extensions

    Get PDF
    This dissertation considers graph theoretic generalizations of the maximum clique problem. Models that were originally proposed in social network analysis literature, are investigated from a mathematical programming perspective for the first time. A social network is usually represented by a graph, and cliques were the first models of "tightly knit groups" in social networks, referred to as cohesive subgroups. Cliques are idealized models and their overly restrictive nature motivated the development of clique relaxations that relax different aspects of a clique. Identifying large cohesive subgroups in social networks has traditionally been used in criminal network analysis to study organized crimes such as terrorism, narcotics and money laundering. More recent applications are in clustering and data mining wireless networks, biological networks as well as graph models of databases and the internet. This research has the potential to impact homeland security, bioinformatics, internet research and telecommunication industry among others. The focus of this dissertation is a degree-based relaxation called k-plex. A distance-based relaxation called k-clique and a diameter-based relaxation called k-club are also investigated in this dissertation. We present the first systematic study of the complexity aspects of these problems and application of mathematical programming techniques in solving them. Graph theoretic properties of the models are identified and used in the development of theory and algorithms. Optimization problems associated with the three models are formulated as binary integer programs and the properties of the associated polytopes are investigated. Facets and valid inequalities are identified based on combinatorial arguments. A branch-and-cut framework is designed and implemented to solve the optimization problems exactly. Specialized preprocessing techniques are developed that, in conjunction with the branch-and-cut algorithm, optimally solve the problems on real-life power law graphs, which is a general class of graphs that include social and biological networks. Computational experiments are performed to study the effectiveness of the proposed solution procedures on benchmark instances and real-life instances. The relationship of these models to the classical maximum clique problem is studied, leading to several interesting observations including a new compact integer programming formulation. We also prove new continuous non-linear formulations for the classical maximum independent set problem which maximize continuous functions over the unit hypercube, and characterize its local and global maxima. Finally, clustering and network design extensions of the clique relaxation models are explored

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Combinatorial approaches for the trunk packing problem

    Get PDF
    In this thesis we consider a three-dimensional packing problem arising in industry. The task is to pack a maximum number of rigid boxes with side length ratios of 4 : 2 : 1 into an irregularly shaped container. Motivated by the structure of manually constructed packings so far, we pursue a discrete approach. We discretize the shape of the container as well as the set of possible box placements. This discrete packing problem can be reduced to a maximum stable set problem. First we formulate the problem as an integer linear program, which admittedly can only be solved to optimality within reasonable runtime for very small instances. Therefore, we present several heuristics based, for example, on the linear programming relaxation or on local search. Other heuristics generate tight packings for the core of the container, thereby reducing the problem to a set of smaller subproblems. We compare all presented algorithms on real data sets. We achieve very good results for the majority of instances and for some instances we even surpass the manually constructed solutions.In dieser Arbeit behandeln wir ein dreidimensionales Packungsproblem aus der Industrie. Die Aufgabe besteht darin, möglichst viele starre Quader mit einem Seitenverhältnis von 4 : 2 : 1 in einen unregelmäßig geformten Container zu packen. Motiviert durch die Struktur der bisher manuell erstellten Packungen verfolgen wir einen diskreten Lösungsansatz. Dazu diskretisieren wir sowohl die Form des Containers als auch die Platzierungsmöglichkeiten der Quader. Dieses diskrete Packungsproblem lässt sich auf die Berechnung einer größtmöglichen unabhängigen Knotenmenge reduzieren. Wir formulieren das Problem zunächst als ganzzahliges lineares Programm, das allerdings nur für sehr kleine Instanzen mit angemessenem Rechenaufwand beweisbar optimal gelöst werden kann. Daher stellen wir verschiedene Heuristiken vor, die zum Beispiel auf einer Relaxierung des ganzzahligen linearen Programms oder lokaler Suche basieren. Andere Heuristiken generieren zunächst dichte Packungen für den Kern des Containers und reduzieren so das Problem auf eine Reihe kleinerer Teilprobleme. Wir vergleichen alle vorgestellten Algorithmen an Hand realer Datensätze. In der Mehrzahl der Fälle erreichen wir sehr gute Resultate, bei einigen Instanzen übertreffen wir sogar die manuell erstellten Packungen

    Solving hard subgraph problems in parallel

    Get PDF
    This thesis improves the state of the art in exact, practical algorithms for finding subgraphs. We study maximum clique, subgraph isomorphism, and maximum common subgraph problems. These are widely applicable: within computing science, subgraph problems arise in document clustering, computer vision, the design of communication protocols, model checking, compiler code generation, malware detection, cryptography, and robotics; beyond, applications occur in biochemistry, electrical engineering, mathematics, law enforcement, fraud detection, fault diagnosis, manufacturing, and sociology. We therefore consider both the ``pure'' forms of these problems, and variants with labels and other domain-specific constraints. Although subgraph-finding should theoretically be hard, the constraint-based search algorithms we discuss can easily solve real-world instances involving graphs with thousands of vertices, and millions of edges. We therefore ask: is it possible to generate ``really hard'' instances for these problems, and if so, what can we learn? By extending research into combinatorial phase transition phenomena, we develop a better understanding of branching heuristics, as well as highlighting a serious flaw in the design of graph database systems. This thesis also demonstrates how to exploit two of the kinds of parallelism offered by current computer hardware. Bit parallelism allows us to carry out operations on whole sets of vertices in a single instruction---this is largely routine. Thread parallelism, to make use of the multiple cores offered by all modern processors, is more complex. We suggest three desirable performance characteristics that we would like when introducing thread parallelism: lack of risk (parallel cannot be exponentially slower than sequential), scalability (adding more processing cores cannot make runtimes worse), and reproducibility (the same instance on the same hardware will take roughly the same time every time it is run). We then detail the difficulties in guaranteeing these characteristics when using modern algorithmic techniques. Besides ensuring that parallelism cannot make things worse, we also increase the likelihood of it making things better. We compare randomised work stealing to new tailored strategies, and perform experiments to identify the factors contributing to good speedups. We show that whilst load balancing is difficult, the primary factor influencing the results is the interaction between branching heuristics and parallelism. By using parallelism to explicitly offset the commitment made to weak early branching choices, we obtain parallel subgraph solvers which are substantially and consistently better than the best sequential algorithms

    Computationally Comparing Biological Networks and Reconstructing Their Evolution

    Get PDF
    Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks. First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition. Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work. Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data. Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models

    TUNING OPTIMIZATION SOFTWARE PARAMETERS FOR MIXED INTEGER PROGRAMMING PROBLEMS

    Get PDF
    The tuning of optimization software is of key interest to researchers solving mixed integer programming (MIP) problems. The efficiency of the optimization software can be greatly impacted by the solver’s parameter settings and the structure of the MIP. A designed experiment approach is used to fit a statistical model that would suggest settings of the parameters that provided the largest reduction in the primal integral metric. Tuning exemplars of six and 59 factors (parameters) of optimization software, experimentation takes place on three classes of MIPs: survivable fixed telecommunication network design, a formulation of the support vector machine with the ramp loss and L1-norm regularization, and node packing for coding theory graphs. This research presents and demonstrates a framework for tuning a portfolio of MIP instances to not only obtain good parameter settings used for future instances of the same class of MIPs, but to also gain insights into which parameters and interactions of parameters are significant for that class of MIPs. The framework is used for benchmarking of solvers with tuned parameters on a portfolio of instances. A group screening method provides a way to reduce the number of factors in a design and reduces the time it takes to perform the tuning process. Portfolio benchmarking provides performance information of optimization solvers on a class with instances of a similar structure
    • …
    corecore