42 research outputs found
A 'stochastic safety radius' for distance-based tree reconstruction
A variety of algorithms have been proposed for reconstructing trees that show
the evolutionary relationships between species by comparing differences in
genetic data across present-day taxa. If the leaf-to-leaf distances in a tree
can be accurately estimated, then it is possible to reconstruct this tree from
these estimated distances, using polynomial-time methods such as the popular
`Neighbor-Joining' algorithm. There is a precise combinatorial condition under
which distance-based methods are guaranteed to return a correct tree (in full
or in part) based on the requirement that the input distances all lie within
some `safety radius' of the true distances. Here, we explore a stochastic
analogue of this condition, and mathematically establish upper and lower bounds
on this `stochastic safety radius' for distance-based tree reconstruction
methods. Using simulations, we show how this notion provides a new way to
compare the performance of distance-based tree reconstruction methods. This may
help explain why Neighbor-Joining performs so well, as its stochastic safety
radius appears close to optimal (while its more classical safety radius is the
same as many other less accurate methods).Comment: 18 pages, 1 figure, 4 table
An evolution strategy approach for the balanced minimum evolution problem
Motivation: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. Results: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective
Fast neighbor joining
AbstractReconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Î(n3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas
Molecular characterization and assessment of genetic diversity of sorghum inbred lines
Selecting parents of diverse genetic base with contrasting phenotype is an important step in developing mapping populations for quantitative trait loci (QTL) detection and marker-assisted selection. We studied genetic diversity in 31 sorghum parents using 413 sorghum simple sequence repeats (SSR) markers. The polymorphism information content (PIC), a measure of gene diversity, varied from 0 to 0.92 with an average of 0.53 and was significantly correlated with number of alleles. The primers IS10215, IS10270 and IS10333 could differentiate all the 31 lines conclusively. Clustering analysis based on the genetic dissimilarity grouped the 31 parents into eight clusters and grouping was in good agreement with pedigree, race and geographic origin. Diverse pairs of sorghum parents were identified with contrast phenotype for various biotic and abiotic stresses with higher genetic diversity for developing recombinant inbred line (RIL) mapping populations to identify QTLs/genes for important traits in sorghum. One of the mapping populations resulted in the identification of QTLs for resistance to sorghum shoot fly and these QTL results were validated in a second mapping population.Key words: Simple sequence repeats (SSR) markers, genetic diversity, sorghum, mapping parents
Efficiency of Algorithms in Phylogenetics
Phylogenetics is the study of evolutionary relationships between species. Phylogenetic
trees have long been the standard object used in evolutionary biology to illustrate how a
given set of species are related. There are some groups (including certain plant and fish
species) for which the ancestral history contains reticulation events, caused by processes that
include hybridization, lateral gene transfer, and recombination. For such groups of species, it
is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic
digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least
two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and
tractability of mathematical models for phylogenetic network methods.
Three important and related measures for summarizing the dissimilarity in phylogenetic
trees are the minimum number of hybridization events required to fit two phylogenetic trees
onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune
and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the
TBR distance) between two phylogenetic trees. The respective problems of computing these
measures are known to be NP-hard, but also fixed-parameter tractable in their respective
natural parameters. This means that, while they are hard to compute in general, for cases
in which a parameter (here the hybridization number and rSPR/TBR distance, respectively)
is small, the problem can be solved efficiently even for large input trees. Here, we present
new analyses showing that the use of the âcluster reductionâ rule â already defined for the
hybridization number and the rSPR distance and introduced here for the TBR distance â can
transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time
one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter
and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic
network displaying both trees. These results appear in [9].
Traditional âdistance based methodsâ reconstruct a phylogenetic tree from a matrix of pairwise
distances between taxa. A phylogenetic network is a generalization of a phylogenetic
tree that can describe evolutionary events such as reticulation and hybridization that are not
tree-like. Although evolution has been known to be more accurately modelled by a network
than a tree for some time, only recently have efforts been made to directly reconstruct a
phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present
a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can
accurately reconstruct ultrametric tree-child networks from the set of distinct distances
between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the
safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2.
This means that if we can obtain accurate estimates of the set of distances between each pair
of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs
the true network
Study of the genetic polymorphism of diploid wheat Triticum boeoticum Boiss. using SSR markers
Diploid wheat Triticum boeoticum Boiss. (genome constitution AA) is a promising source of new valuÂable alleles for improving cultivated wheat species. Therefore, the evaluation of the intraspecies diversity of T. boeoticum and DNA fingerprinting of accessions of this species are topical tasks. In this paper, the geÂnetic diversity of over 60 T. boeoticum accessions was studied using 11 SSR markers. The analysis revealed 83 alleles, 7.5 alleles per locus on the average. The values of expected (HE) and observed (HO) heterozygosity varied within 0.00â0.74 and 0.17â0.89, respectively, the average indices being HO = 0.13 and HE = 0.52. The PIC value for each locus was within 0.17â0.88, 0.49 on the average. Unique alleles were found in all loci studied. Cluster analysis allowed the accessions studied to be combined into five major groups. The distances between the groups varied from 0 to 1, pointing to a high level of genetic differences in the collection under study. On the base of PCoA, five major groups were formed and some correspondence with the dendrogram was detected. Summarizing the data of PCoA and cluster analysis, we noted a weak genetic differentiation in the studied collection of T. boeoticum. A correlation between the genetic distance and geographic origin was revealed only for accessions of diploid wheat T. boeoticum from Iran. The analysis of the T. boeoticum accessions studied showed a wide diversity for SSR loci. The results expand our knowledge and proÂvide additional information on the genetic structure of the collection and on the genetic diversity of T. boeotiÂcum accessions studied
Ableitung der Organismen- und Charakteristika-Evolution aus funktionellen Genomeigenschaften
The development of phylogenomics pipelines is important as phylogeny-driven genome sequencing projects generate plenty of genomic data. This thesis work focused on the development of three pipelines which yield: 1. proper taxonomic descriptions of new genomes; 2. phylogeny using genome encoded functionalities and COGs; and 3. evolutionary correlations between functional characters (e.g., genes), genome features of newly sequenced genomes and their functional linkages.
34 variations of distance-based phylogenetic tree reconstruction strategies for eight datasets were formulated with regard to different sources, threshold applied weights and distance calculation methods. The specific strategic variations which are similar to previously described approaches were statistically tested. A contemporary way of inferring phylogenies using conserved genome features in whole genome level was optimized and reported.
A strategy was developed to calculate the evolutionary correlation in different genomic features and their functional linkages. BayesTraits software was used in the pipeline to estimate the correlated evolution between character pairs. Characters were clustered using the MCL algorithm with regard to significant evolutionary correlations between characters. The pipelines were standardized using eight datasets. E. coli + Shigella, Spirochaetae and Rhodobacteraceae datasets were applied on this pipeline for finding evolutionary correlation between functional characters. The correlated genes in motility pathways were identified and interpreted with previous scientific evidences for the Spirochaetae dataset. The distribution of correlated enzymes per pathway in Rhodobacteraceae dataset were identified. The evolutionary correlation of cp4-44 prophage element with pathogenicity in E. coli + Shigella were identified and interpreted along with the character state reconstructions of both characters. The evolutionary correlation of four pathways and one enzyme with marine/non-marine living characteristics of Rhodobacteraceae dataset were identified and interpreted along with character state reconstruction which describes patterns of evolutionary events for different genomic features on the phylogeny.Die Entwicklung phylogenetischer Pipelines ist wichtig, da Phylogenie-getriebene Sequenzierungsprojekte viele genomische Daten generieren. Diese Arbeit konzentriert sich auf die Entwicklung von drei Piplelines, die folgendes leisten: 1. korrekte taxonomische Beschreibungen von neuen Genomen, 2. im Genom kodierten FunktionalitÀten und COGs und 3. evolutionÀre Korrelationen zwischen funktionalen Einheiten (z.B. Genen) und ihre funktionalen Verbindungen.
34 Variationen von Distanz-basierten Rekonstruktionsstrategien fĂŒr phylogenetische BĂ€ume wurden fĂŒr acht DatensĂ€tze wurden hinsichtlich verschiedener Quellen, aufgebracht Schwellengewichte und Abstandsberechnungsmethoden formuliert. Die einzelnen Variationen der Strategien, die Ă€hnlich zu bereits beschriebenen AnsĂ€tzen sind, wurden statistisch ĂŒberprĂŒft und diskutiert. Eine zeitgemĂ€Ăe Art und Weise der Ableitung Phylogenien mit konservierten Genomischen Features in Gesamtgenomlevel optimiert und gemeldet.
Es wurde eine Strategie entwickelt, um die evolutionĂ€re Beziehung in verscheidene Genomischen Features und ihrer funktionellen VerknĂŒpfungen zu berechnen. Die BayesTraits Software wurde in der Pipeline genutzt um den Grad der korrelierten Evolution zwischen Einheiten paaren zu schĂ€tzen. Die Einheiten wurden mit dem MCL Algorithmus geclustert. Dabei lag das Hauptaugenmerk auf den signifikanten evolutionĂ€ren Korrelationen zwischen den Einheiten. Die Einheiten wurden mit acht DatensĂ€tzen standardisiert. E. coli + Shigella, Spirochaetae und Rhodobacteraceae DatensĂ€tze wurden durch die Pipeline prozessiert um evolutionĂ€re Korrelationen zwischen den funktionalen Einheiten zu finden. Die korrelierten Gene in Soffwechselwegen, die mit MobilitĂ€t assoziiert sind, wurden fĂŒr den Spirochaetae Datensatz mit frĂŒheren wissenschaftlichen Beweise interpretiert. Die Verteilung von korrelierten Enzymen je Stoffwechselweg im Rhodobacteraceae Datensatz identifiziert wurden. Die evolutionĂ€re Korrelation von cp4-44 Prophage Element mit der PathogenitĂ€t in E. coli + Shigella wurden identifiziert und zusammen mit den staatlichen Charakter Rekonstruktionen der beiden Zeichen interpretiert. Die evolutionĂ€re Korrelation von vier Pathway und ein Enzym mit Meeres/nicht-Meeres lebenden Eigenschaften von Rhodobacteraceae Datensatzes identifiziert und zusammen mit Charakter Zustandsrekonstruktion. Es beschreibt verschiedene Muster evolutionĂ€rer Ereignisse der Genomischen Features auf die Phylogenie
Genetic structure of disjunct Argentinean populations of the subtropical tree <i>Anadenanthera colubrina</i> var. <i>cebil</i> (Fabaceae)
Anadenanthera colubrina var. cebil is a native South American tree species inhabiting seasonally dry tropical forests (SDTFs). Its current disjunct distribution presumably represents fragments of a historical much larger area of this forest type, which has also been highly impacted by human activities. In this way the hypothesis of this study is that the natural populations of A. colubrina var. cebil from Northern Argentina represent vestiges of ancient fragmentation, but they are additionally influenced by a certain degree of gene flow among them. We aimed to analyze the genetic structure of both nuclear and chloroplast DNA to evaluate the relative role of ancient and recent fragmentation on intraspecific diversity patterns. Sixty-nine individuals of four natural populations were analyzed using eight nuclear microsatellites (ncSSR) and four chloroplast microsatellite loci (cpSSR). The level and distribution of genetic variation were estimated by standard population genetic parameters and Neighbor Joining as well as Bayesian analyses. The eight ncSSR loci were highly polymorphic, while genetic diversity of cpSSRs was low. Nuclear SSRs displayed lower genetic differentiation among populations than cpSSR haplotypes (FST 0.11 and 0.95, respectively). However, high differentiation between phytogeographic provinces was observed in both genomes. The high genetic differentiation detected emphasizes the role of ancient fragmentation. However, the Paranaense province also shows the effects of recent fragmentation on genetic structure, whereas gene flow by pollen preserves the effects of genetic drift in the Yungas province.Centro Regional de Estudios GenĂłmico