5 research outputs found

    Properly colored subgraphs in edge-colored graphs

    Get PDF

    Models and Algorithms for Comparative Genomics

    Get PDF
    The deluge of sequenced whole-genome data has motivated the study of comparative genomics, which provides global views on genome evolution, and also offers practical solutions in deciphering the functional roles of components of genomes. A fundamental computational problem in whole-genome comparison is to infer the most likely large-scale events~(rearrangements and content-modifying events) of given genomes during their history of evolution. Based on the principle of parsimony, such inference is usually formulated as the so called edit distance problems~(for two genomes) or median problems~(for multiple genomes), i.e., to compute the minimum number of certain types of large-scale events that can explain the differences of the given genomes. In this dissertation, we develop novel algorithms for edit distance problems and median problems and also apply them to analyze and annotate biological datasets. For pairwise whole-genome comparison, we study the most challenging cases of edit distance problems---the given genomes contain duplicate genes. We proposed several exact algorithms and approximation algorithms under various combinations of large-scale events. Specifically, we designed the first exact algorithm to compute the edit distance under the DCJ~(double-cut-and-join) model, and the first exact algorithm to compute the edit distance under a model including DCJ operations and segmental duplications. We devised a (1.5+ϵ)(1.5 + \epsilon)-approximation algorithm to compute the edit distance under a model including DCJ operations, insertions, and deletions. We also proposed a very fast and exact algorithm to compute the exemplar breakpoint distance. For multiple whole-genome comparison, we study the median problem under the DCJ model. We designed a polynomial-time algorithm using a network flow formulation to compute the so called adequate subgraphs---a central phase in computing the median. We also proved that an existing upper bound of the median distance is tight. These above algorithms determine the correspondence between functional elements~(for instance, genes) across genomes, and thus can be used to systematically infer functional relationships and annotate genomes. For example, we applied our methods to infer orthologs and in-paralogs between a pair of genomes---a key step in analyzing the functions of protein-coding genes. On biological whole-genome datasets, our methods run very fast, scale up to whole genomes, and also achieve very high accuracy

    Exact and evolutionary algorithms for the score-constrained packing problem

    Get PDF
    This thesis concerns the Score-Constrained Packing Problem (SCPP), a combinatorial optimisation problem related to the one-dimensional bin packing problem. The aim of the SCPP is to pack a set of rectangular items from left to right into the fewest number of bins such that no bin is overfilled; however, the order and orientation of the items in each bin affects the feasibility of the overall solution. The SCPP has applications in the packaging industry, and obtaining high quality solutions for instances of the SCPP has the ability to reduce the amount of waste material, costs, and time, which motivates the study in this thesis. The minimal existing research on the SCPP leads us to explore a wide range of approaches to the problem in this thesis, implementing ideas from related problems in literature as well as bespoke methods. To begin, we present an exact algorithm that can produce a feasible configuration of a subset of items in a single bin in polynomial-time. We then introduce a range of methods for the SCPP including heuristics, an evolutionary algorithm framework comprising a local search procedure and a choice of three distinct recombination operators, and two algorithms combining metaheuristics with an exact procedure. Each method is investigated to gain more insight into the characteristics that benefit or hinder the improvement of solutions, both theoretically and computationally, using a large number of problem instances with varying parameters. This allows us to determine the specific methods and properties that produce superior solutions depending on the type of problem instance
    corecore