100 research outputs found

    A New Implementation and Detailed Study of Breakpoint Analysis

    Get PDF
    Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions. One method is breakpoint analysis, developed by Blanchette and Sankoff 2 for solving the breakpoint phylogeny.\u27 Our earlier studies 5;6 confirmed the usefulness of this approach, but also found that BPAnalysis, the implementation developed by Sankoff and Blanchette, was too slow to use on all but very small datasets. We report here on a reimplementation of BPAnalysis using the principles of algorithmic engineering. Our faster (by 2 to 3 orders of magnitude) and flexible implementation allowed us to conduct studies on the characteristics of breakpoint analysis, in terms of running time, quality, and robustness, as well as to analyze datasets that had so far been considered out of reach. We report on these findings and also discuss future directions for our new implementation.\u2

    Parallelizing superFine

    Get PDF
    The estimation of the Tree of Life, a rooted binary tree representing how all extant species evolved from a common ancestor, is one of the grand challenges of modern biology. Research groups around the world are attempting to estimate evolutionary trees on particular sets of species (typically clades, or rooted subtrees), in the hope that a final "supertree" can be produced from these smaller estimated trees through the addition of a "scaffold" tree of randomly sampled taxa from the tree of life. However, supertree estimation is itself a computationally challenging problem, because the most accurate trees are produced by running heuristics for NP-hard problems. In this paper we report on a study in which we parallelize SuperFine, the currently most accurate and efficient supertree estimation method. We explore performance of these parallel implementations on simulated data-sets with 1000 taxa and biological data-sets with up to 2,228 taxa. Our study reveals aspects of SuperFine that limit the speed-ups that are possible through the type of outer-loop parallelism we exploit.(undefined

    The hardness of perfect phylogeny, feasible register assignment and other problems on thin colored graphs

    Get PDF
    AbstractIn this paper, we consider the complexity of a number of combinatorial problems; namely, Intervalizing Colored Graphs (DNA physical mapping), Triangulating Colored Graphs (perfect phylogeny), (Directed) (Modified) Colored Cutwidth, Feasible Register Assignment and Module Allocation for graphs of bounded pathwidth. Each of these problems has as a characteristic a uniform upper bound on the tree or path width of the graphs in “yes”-instances. For all of these problems with the exceptions of Feasible Register Assignment and Module Allocation, a vertex or edge coloring is given as part of the input. Our main results are that the parameterized variant of each of the considered problems is hard for the complexity classes W[t] for all t∈N. We also show that Intervalizing Colored Graphs, Triangulating Colored Graphs, and Colored Cutwidth are NP-Complete

    Sequence length requirements for phylogenetic methods

    Get PDF

    New Software for Computational Phylogenetics

    Get PDF

    Circular Networks from Distorted Metrics

    Full text link
    Trees have long been used as a graphical representation of species relationships. However complex evolutionary events, such as genetic reassortments or hybrid speciations which occur commonly in viruses, bacteria and plants, do not fit into this elementary framework. Alternatively, various network representations have been developed. Circular networks are a natural generalization of leaf-labeled trees interpreted as split systems, that is, collections of bipartitions over leaf labels corresponding to current species. Although such networks do not explicitly model specific evolutionary events of interest, their straightforward visualization and fast reconstruction have made them a popular exploratory tool to detect network-like evolution in genetic datasets. Standard reconstruction methods for circular networks, such as Neighbor-Net, rely on an associated metric on the species set. Such a metric is first estimated from DNA sequences, which leads to a key difficulty: distantly related sequences produce statistically unreliable estimates. This is problematic for Neighbor-Net as it is based on the popular tree reconstruction method Neighbor-Joining, whose sensitivity to distance estimation errors is well established theoretically. In the tree case, more robust reconstruction methods have been developed using the notion of a distorted metric, which captures the dependence of the error in the distance through a radius of accuracy. Here we design the first circular network reconstruction method based on distorted metrics. Our method is computationally efficient. Moreover, the analysis of its radius of accuracy highlights the important role played by the maximum incompatibility, a measure of the extent to which the network differs from a tree.Comment: Submitte
    • 

    corecore