42 research outputs found

    A 'stochastic safety radius' for distance-based tree reconstruction

    Full text link
    A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day taxa. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular `Neighbor-Joining' algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some `safety radius' of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this `stochastic safety radius' for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).Comment: 18 pages, 1 figure, 4 table

    An evolution strategy approach for the balanced minimum evolution problem

    Get PDF
    Motivation: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. Results: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective

    Fast neighbor joining

    Get PDF
    AbstractReconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Θ(n3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas

    Molecular characterization and assessment of genetic diversity of sorghum inbred lines

    Get PDF
    Selecting parents of diverse genetic base with contrasting phenotype is an important step in developing mapping populations for quantitative trait loci (QTL) detection and marker-assisted selection. We studied genetic diversity in 31 sorghum parents using 413 sorghum simple sequence repeats (SSR) markers. The polymorphism information content (PIC), a measure of gene diversity, varied from 0 to 0.92 with an average of 0.53 and was significantly correlated with number of alleles. The primers IS10215, IS10270 and IS10333 could differentiate all the 31 lines conclusively. Clustering analysis based on the genetic dissimilarity grouped the 31 parents into eight clusters and grouping was in good agreement with pedigree, race and geographic origin. Diverse pairs of sorghum parents were identified with contrast phenotype for various biotic and abiotic stresses with higher genetic diversity for developing recombinant inbred line (RIL) mapping populations to identify QTLs/genes for important traits in sorghum. One of the mapping populations resulted in the identification of QTLs for resistance to sorghum shoot fly and these QTL results were validated in a second mapping population.Key words: Simple sequence repeats (SSR) markers, genetic diversity, sorghum, mapping parents

    Efficiency of Algorithms in Phylogenetics

    Get PDF
    Phylogenetics is the study of evolutionary relationships between species. Phylogenetic trees have long been the standard object used in evolutionary biology to illustrate how a given set of species are related. There are some groups (including certain plant and fish species) for which the ancestral history contains reticulation events, caused by processes that include hybridization, lateral gene transfer, and recombination. For such groups of species, it is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and tractability of mathematical models for phylogenetic network methods. Three important and related measures for summarizing the dissimilarity in phylogenetic trees are the minimum number of hybridization events required to fit two phylogenetic trees onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the TBR distance) between two phylogenetic trees. The respective problems of computing these measures are known to be NP-hard, but also fixed-parameter tractable in their respective natural parameters. This means that, while they are hard to compute in general, for cases in which a parameter (here the hybridization number and rSPR/TBR distance, respectively) is small, the problem can be solved efficiently even for large input trees. Here, we present new analyses showing that the use of the “cluster reduction” rule – already defined for the hybridization number and the rSPR distance and introduced here for the TBR distance – can transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic network displaying both trees. These results appear in [9]. Traditional “distance based methods” reconstruct a phylogenetic tree from a matrix of pairwise distances between taxa. A phylogenetic network is a generalization of a phylogenetic tree that can describe evolutionary events such as reticulation and hybridization that are not tree-like. Although evolution has been known to be more accurately modelled by a network than a tree for some time, only recently have efforts been made to directly reconstruct a phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can accurately reconstruct ultrametric tree-child networks from the set of distinct distances between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2. This means that if we can obtain accurate estimates of the set of distances between each pair of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs the true network

    Study of the genetic polymorphism of diploid wheat Triticum boeoticum Boiss. using SSR markers

    Get PDF
    Diploid wheat Triticum boeoticum Boiss. (genome constitution AA) is a promising source of new valu­able alleles for improving cultivated wheat species. Therefore, the evaluation of the intraspecies diversity of T. boeoticum and DNA fingerprinting of accessions of this species are topical tasks. In this paper, the ge­netic diversity of over 60 T. boeoticum accessions was studied using 11 SSR markers. The analysis revealed 83 alleles, 7.5 alleles per locus on the average. The values of expected (HE) and observed (HO) heterozygosity varied within 0.00–0.74 and 0.17–0.89, respectively, the average indices being HO = 0.13 and HE = 0.52. The PIC value for each locus was within 0.17–0.88, 0.49 on the average. Unique alleles were found in all loci studied. Cluster analysis allowed the accessions studied to be combined into five major groups. The distances between the groups varied from 0 to 1, pointing to a high level of genetic differences in the collection under study. On the base of PCoA, five major groups were formed and some correspondence with the dendrogram was detected. Summarizing the data of PCoA and cluster analysis, we noted a weak genetic differentiation in the studied collection of T. boeoticum. A correlation between the genetic distance and geographic origin was revealed only for accessions of diploid wheat T. boeoticum from Iran. The analysis of the T. boeoticum accessions studied showed a wide diversity for SSR loci. The results expand our knowledge and pro­vide additional information on the genetic structure of the collection and on the genetic diversity of T. boeoti­cum accessions studied

    Ableitung der Organismen- und Charakteristika-Evolution aus funktionellen Genomeigenschaften

    Get PDF
    The development of phylogenomics pipelines is important as phylogeny-driven genome sequencing projects generate plenty of genomic data. This thesis work focused on the development of three pipelines which yield: 1. proper taxonomic descriptions of new genomes; 2. phylogeny using genome encoded functionalities and COGs; and 3. evolutionary correlations between functional characters (e.g., genes), genome features of newly sequenced genomes and their functional linkages. 34 variations of distance-based phylogenetic tree reconstruction strategies for eight datasets were formulated with regard to different sources, threshold applied weights and distance calculation methods. The specific strategic variations which are similar to previously described approaches were statistically tested. A contemporary way of inferring phylogenies using conserved genome features in whole genome level was optimized and reported. A strategy was developed to calculate the evolutionary correlation in different genomic features and their functional linkages. BayesTraits software was used in the pipeline to estimate the correlated evolution between character pairs. Characters were clustered using the MCL algorithm with regard to significant evolutionary correlations between characters. The pipelines were standardized using eight datasets. E. coli + Shigella, Spirochaetae and Rhodobacteraceae datasets were applied on this pipeline for finding evolutionary correlation between functional characters. The correlated genes in motility pathways were identified and interpreted with previous scientific evidences for the Spirochaetae dataset. The distribution of correlated enzymes per pathway in Rhodobacteraceae dataset were identified. The evolutionary correlation of cp4-44 prophage element with pathogenicity in E. coli + Shigella were identified and interpreted along with the character state reconstructions of both characters. The evolutionary correlation of four pathways and one enzyme with marine/non-marine living characteristics of Rhodobacteraceae dataset were identified and interpreted along with character state reconstruction which describes patterns of evolutionary events for different genomic features on the phylogeny.Die Entwicklung phylogenetischer Pipelines ist wichtig, da Phylogenie-getriebene Sequenzierungsprojekte viele genomische Daten generieren. Diese Arbeit konzentriert sich auf die Entwicklung von drei Piplelines, die folgendes leisten: 1. korrekte taxonomische Beschreibungen von neuen Genomen, 2. im Genom kodierten FunktionalitĂ€ten und COGs und 3. evolutionĂ€re Korrelationen zwischen funktionalen Einheiten (z.B. Genen) und ihre funktionalen Verbindungen. 34 Variationen von Distanz-basierten Rekonstruktionsstrategien fĂŒr phylogenetische BĂ€ume wurden fĂŒr acht DatensĂ€tze wurden hinsichtlich verschiedener Quellen, aufgebracht Schwellengewichte und Abstandsberechnungsmethoden formuliert. Die einzelnen Variationen der Strategien, die Ă€hnlich zu bereits beschriebenen AnsĂ€tzen sind, wurden statistisch ĂŒberprĂŒft und diskutiert. Eine zeitgemĂ€ĂŸe Art und Weise der Ableitung Phylogenien mit konservierten Genomischen Features in Gesamtgenomlevel optimiert und gemeldet. Es wurde eine Strategie entwickelt, um die evolutionĂ€re Beziehung in verscheidene Genomischen Features und ihrer funktionellen VerknĂŒpfungen zu berechnen. Die BayesTraits Software wurde in der Pipeline genutzt um den Grad der korrelierten Evolution zwischen Einheiten paaren zu schĂ€tzen. Die Einheiten wurden mit dem MCL Algorithmus geclustert. Dabei lag das Hauptaugenmerk auf den signifikanten evolutionĂ€ren Korrelationen zwischen den Einheiten. Die Einheiten wurden mit acht DatensĂ€tzen standardisiert. E. coli + Shigella, Spirochaetae und Rhodobacteraceae DatensĂ€tze wurden durch die Pipeline prozessiert um evolutionĂ€re Korrelationen zwischen den funktionalen Einheiten zu finden. Die korrelierten Gene in Soffwechselwegen, die mit MobilitĂ€t assoziiert sind, wurden fĂŒr den Spirochaetae Datensatz mit frĂŒheren wissenschaftlichen Beweise interpretiert. Die Verteilung von korrelierten Enzymen je Stoffwechselweg im Rhodobacteraceae Datensatz identifiziert wurden. Die evolutionĂ€re Korrelation von cp4-44 Prophage Element mit der PathogenitĂ€t in E. coli + Shigella wurden identifiziert und zusammen mit den staatlichen Charakter Rekonstruktionen der beiden Zeichen interpretiert. Die evolutionĂ€re Korrelation von vier Pathway und ein Enzym mit Meeres/nicht-Meeres lebenden Eigenschaften von Rhodobacteraceae Datensatzes identifiziert und zusammen mit Charakter Zustandsrekonstruktion. Es beschreibt verschiedene Muster evolutionĂ€rer Ereignisse der Genomischen Features auf die Phylogenie

    Genetic structure of disjunct Argentinean populations of the subtropical tree <i>Anadenanthera colubrina</i> var. <i>cebil</i> (Fabaceae)

    Get PDF
    Anadenanthera colubrina var. cebil is a native South American tree species inhabiting seasonally dry tropical forests (SDTFs). Its current disjunct distribution presumably represents fragments of a historical much larger area of this forest type, which has also been highly impacted by human activities. In this way the hypothesis of this study is that the natural populations of A. colubrina var. cebil from Northern Argentina represent vestiges of ancient fragmentation, but they are additionally influenced by a certain degree of gene flow among them. We aimed to analyze the genetic structure of both nuclear and chloroplast DNA to evaluate the relative role of ancient and recent fragmentation on intraspecific diversity patterns. Sixty-nine individuals of four natural populations were analyzed using eight nuclear microsatellites (ncSSR) and four chloroplast microsatellite loci (cpSSR). The level and distribution of genetic variation were estimated by standard population genetic parameters and Neighbor Joining as well as Bayesian analyses. The eight ncSSR loci were highly polymorphic, while genetic diversity of cpSSRs was low. Nuclear SSRs displayed lower genetic differentiation among populations than cpSSR haplotypes (FST 0.11 and 0.95, respectively). However, high differentiation between phytogeographic provinces was observed in both genomes. The high genetic differentiation detected emphasizes the role of ancient fragmentation. However, the Paranaense province also shows the effects of recent fragmentation on genetic structure, whereas gene flow by pollen preserves the effects of genetic drift in the Yungas province.Centro Regional de Estudios GenĂłmico
    corecore