Search CORE

42 research outputs found

A 'stochastic safety radius' for distance-based tree reconstruction

Author: Gascuel Olivier
Steel Mike
Publication venue
Publication date: 14/11/2014
Field of study

A variety of algorithms have been proposed for reconstructing trees that show the evolutionary relationships between species by comparing differences in genetic data across present-day taxa. If the leaf-to-leaf distances in a tree can be accurately estimated, then it is possible to reconstruct this tree from these estimated distances, using polynomial-time methods such as the popular `Neighbor-Joining' algorithm. There is a precise combinatorial condition under which distance-based methods are guaranteed to return a correct tree (in full or in part) based on the requirement that the input distances all lie within some `safety radius' of the true distances. Here, we explore a stochastic analogue of this condition, and mathematically establish upper and lower bounds on this `stochastic safety radius' for distance-based tree reconstruction methods. Using simulations, we show how this notion provides a new way to compare the performance of distance-based tree reconstruction methods. This may help explain why Neighbor-Joining performs so well, as its stochastic safety radius appears close to optimal (while its more classical safety radius is the same as many other less accurate methods).Comment: 18 pages, 1 figure, 4 table

arXiv.org e-Print Archive

CiteSeerX

An evolution strategy approach for the balanced minimum evolution problem

Author: Camerota Verdù F. J.
Castelli L.
Catanzaro D.
Gasparin A.
Publication venue
Publication date: 01/01/2023
Field of study

Motivation: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. Results: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective

Archivio istituzionale della ricerca - Università di Trieste

Fast neighbor joining

Author: Elias Isaac
Lagergren Jens
Publication venue: Published by Elsevier B.V.
Publication date: 17/05/2009
Field of study

AbstractReconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Θ(n3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas

Elsevier - Publisher Connector

Molecular characterization and assessment of genetic diversity of sorghum inbred lines

Author: Balakrishna D
Madhusudhana R
Patil JV
Rajendrakumar P
Seetharama N
Publication venue: 'African Journals Online (AJOL)'
Publication date: 26/01/2016
Field of study

Selecting parents of diverse genetic base with contrasting phenotype is an important step in developing mapping populations for quantitative trait loci (QTL) detection and marker-assisted selection. We studied genetic diversity in 31 sorghum parents using 413 sorghum simple sequence repeats (SSR) markers. The polymorphism information content (PIC), a measure of gene diversity, varied from 0 to 0.92 with an average of 0.53 and was significantly correlated with number of alleles. The primers IS10215, IS10270 and IS10333 could differentiate all the 31 lines conclusively. Clustering analysis based on the genetic dissimilarity grouped the 31 parents into eight clusters and grouping was in good agreement with pedigree, race and geographic origin. Diverse pairs of sorghum parents were identified with contrast phenotype for various biotic and abiotic stresses with higher genetic diversity for developing recombinant inbred line (RIL) mapping populations to identify QTLs/genes for important traits in sorghum. One of the mapping populations resulted in the identification of QTLs for resistance to sorghum shoot fly and these QTL results were validated in a second mapping population.Key words: Simple sequence repeats (SSR) markers, genetic diversity, sorghum, mapping parents

AJOL - African Journals Online

Efficiency of Algorithms in Phylogenetics

Author: TOKAC NIHAN
Publication venue
Publication date: 01/01/2016
Field of study

Phylogenetics is the study of evolutionary relationships between species. Phylogenetic trees have long been the standard object used in evolutionary biology to illustrate how a given set of species are related. There are some groups (including certain plant and fish species) for which the ancestral history contains reticulation events, caused by processes that include hybridization, lateral gene transfer, and recombination. For such groups of species, it is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and tractability of mathematical models for phylogenetic network methods. Three important and related measures for summarizing the dissimilarity in phylogenetic trees are the minimum number of hybridization events required to fit two phylogenetic trees onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the TBR distance) between two phylogenetic trees. The respective problems of computing these measures are known to be NP-hard, but also fixed-parameter tractable in their respective natural parameters. This means that, while they are hard to compute in general, for cases in which a parameter (here the hybridization number and rSPR/TBR distance, respectively) is small, the problem can be solved efficiently even for large input trees. Here, we present new analyses showing that the use of the “cluster reduction” rule – already defined for the hybridization number and the rSPR distance and introduced here for the TBR distance – can transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic network displaying both trees. These results appear in [9]. Traditional “distance based methods” reconstruct a phylogenetic tree from a matrix of pairwise distances between taxa. A phylogenetic network is a generalization of a phylogenetic tree that can describe evolutionary events such as reticulation and hybridization that are not tree-like. Although evolution has been known to be more accurately modelled by a network than a tree for some time, only recently have efforts been made to directly reconstruct a phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can accurately reconstruct ultrametric tree-child networks from the set of distinct distances between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2. This means that if we can obtain accurate estimates of the set of distances between each pair of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs the true network

Durham e-Theses

Study of the genetic polymorphism of diploid wheat Triticum boeoticum Boiss. using SSR markers

Author: M. A. Abbasov
Publication venue: 'Institute of Cytology and Genetics, SB RAS'
Publication date: 01/08/2018
Field of study

Diploid wheat Triticum boeoticum Boiss. (genome constitution AA) is a promising source of new valuable alleles for improving cultivated wheat species. Therefore, the evaluation of the intraspecies diversity of T. boeoticum and DNA fingerprinting of accessions of this species are topical tasks. In this paper, the genetic diversity of over 60 T. boeoticum accessions was studied using 11 SSR markers. The analysis revealed 83 alleles, 7.5 alleles per locus on the average. The values of expected (HE) and observed (HO) heterozygosity varied within 0.00–0.74 and 0.17–0.89, respectively, the average indices being HO = 0.13 and HE = 0.52. The PIC value for each locus was within 0.17–0.88, 0.49 on the average. Unique alleles were found in all loci studied. Cluster analysis allowed the accessions studied to be combined into five major groups. The distances between the groups varied from 0 to 1, pointing to a high level of genetic differences in the collection under study. On the base of PCoA, five major groups were formed and some correspondence with the dendrogram was detected. Summarizing the data of PCoA and cluster analysis, we noted a weak genetic differentiation in the studied collection of T. boeoticum. A correlation between the genetic distance and geographic origin was revealed only for accessions of diploid wheat T. boeoticum from Iran. The analysis of the T. boeoticum accessions studied showed a wide diversity for SSR loci. The results expand our knowledge and provide additional information on the genetic structure of the collection and on the genetic diversity of T. boeoticum accessions studied

Directory of Open Access Journals

Ableitung der Organismen- und Charakteristika-Evolution aus funktionellen Genomeigenschaften

Author: Kandavel Palani Kannan
Publication venue
Publication date: 05/03/2015
Field of study

The development of phylogenomics pipelines is important as phylogeny-driven genome sequencing projects generate plenty of genomic data. This thesis work focused on the development of three pipelines which yield: 1. proper taxonomic descriptions of new genomes; 2. phylogeny using genome encoded functionalities and COGs; and 3. evolutionary correlations between functional characters (e.g., genes), genome features of newly sequenced genomes and their functional linkages. 34 variations of distance-based phylogenetic tree reconstruction strategies for eight datasets were formulated with regard to different sources, threshold applied weights and distance calculation methods. The specific strategic variations which are similar to previously described approaches were statistically tested. A contemporary way of inferring phylogenies using conserved genome features in whole genome level was optimized and reported. A strategy was developed to calculate the evolutionary correlation in different genomic features and their functional linkages. BayesTraits software was used in the pipeline to estimate the correlated evolution between character pairs. Characters were clustered using the MCL algorithm with regard to significant evolutionary correlations between characters. The pipelines were standardized using eight datasets. E. coli + Shigella, Spirochaetae and Rhodobacteraceae datasets were applied on this pipeline for finding evolutionary correlation between functional characters. The correlated genes in motility pathways were identified and interpreted with previous scientific evidences for the Spirochaetae dataset. The distribution of correlated enzymes per pathway in Rhodobacteraceae dataset were identified. The evolutionary correlation of cp4-44 prophage element with pathogenicity in E. coli + Shigella were identified and interpreted along with the character state reconstructions of both characters. The evolutionary correlation of four pathways and one enzyme with marine/non-marine living characteristics of Rhodobacteraceae dataset were identified and interpreted along with character state reconstruction which describes patterns of evolutionary events for different genomic features on the phylogeny.Die Entwicklung phylogenetischer Pipelines ist wichtig, da Phylogenie-getriebene Sequenzierungsprojekte viele genomische Daten generieren. Diese Arbeit konzentriert sich auf die Entwicklung von drei Piplelines, die folgendes leisten: 1. korrekte taxonomische Beschreibungen von neuen Genomen, 2. im Genom kodierten Funktionalitäten und COGs und 3. evolutionäre Korrelationen zwischen funktionalen Einheiten (z.B. Genen) und ihre funktionalen Verbindungen. 34 Variationen von Distanz-basierten Rekonstruktionsstrategien für phylogenetische Bäume wurden für acht Datensätze wurden hinsichtlich verschiedener Quellen, aufgebracht Schwellengewichte und Abstandsberechnungsmethoden formuliert. Die einzelnen Variationen der Strategien, die ähnlich zu bereits beschriebenen Ansätzen sind, wurden statistisch überprüft und diskutiert. Eine zeitgemäße Art und Weise der Ableitung Phylogenien mit konservierten Genomischen Features in Gesamtgenomlevel optimiert und gemeldet. Es wurde eine Strategie entwickelt, um die evolutionäre Beziehung in verscheidene Genomischen Features und ihrer funktionellen Verknüpfungen zu berechnen. Die BayesTraits Software wurde in der Pipeline genutzt um den Grad der korrelierten Evolution zwischen Einheiten paaren zu schätzen. Die Einheiten wurden mit dem MCL Algorithmus geclustert. Dabei lag das Hauptaugenmerk auf den signifikanten evolutionären Korrelationen zwischen den Einheiten. Die Einheiten wurden mit acht Datensätzen standardisiert. E. coli + Shigella, Spirochaetae und Rhodobacteraceae Datensätze wurden durch die Pipeline prozessiert um evolutionäre Korrelationen zwischen den funktionalen Einheiten zu finden. Die korrelierten Gene in Soffwechselwegen, die mit Mobilität assoziiert sind, wurden für den Spirochaetae Datensatz mit früheren wissenschaftlichen Beweise interpretiert. Die Verteilung von korrelierten Enzymen je Stoffwechselweg im Rhodobacteraceae Datensatz identifiziert wurden. Die evolutionäre Korrelation von cp4-44 Prophage Element mit der Pathogenität in E. coli + Shigella wurden identifiziert und zusammen mit den staatlichen Charakter Rekonstruktionen der beiden Zeichen interpretiert. Die evolutionäre Korrelation von vier Pathway und ein Enzym mit Meeres/nicht-Meeres lebenden Eigenschaften von Rhodobacteraceae Datensatzes identifiziert und zusammen mit Charakter Zustandsrekonstruktion. Es beschreibt verschiedene Muster evolutionärer Ereignisse der Genomischen Features auf die Phylogenie

Digitale Bibliothek Braunschweig

Genetic structure of disjunct Argentinean populations of the subtropical tree <i>Anadenanthera colubrina</i> var. <i>cebil</i> (Fabaceae)

Author: Barrandeguy María Eugenia
Finkeldey Reiner
García María Victoria
Prinz Kathleen
Rivera Pomar Rolando Víctor
Publication venue
Publication date: 19/05/2022
Field of study

Anadenanthera colubrina var. cebil is a native South American tree species inhabiting seasonally dry tropical forests (SDTFs). Its current disjunct distribution presumably represents fragments of a historical much larger area of this forest type, which has also been highly impacted by human activities. In this way the hypothesis of this study is that the natural populations of A. colubrina var. cebil from Northern Argentina represent vestiges of ancient fragmentation, but they are additionally influenced by a certain degree of gene flow among them. We aimed to analyze the genetic structure of both nuclear and chloroplast DNA to evaluate the relative role of ancient and recent fragmentation on intraspecific diversity patterns. Sixty-nine individuals of four natural populations were analyzed using eight nuclear microsatellites (ncSSR) and four chloroplast microsatellite loci (cpSSR). The level and distribution of genetic variation were estimated by standard population genetic parameters and Neighbor Joining as well as Bayesian analyses. The eight ncSSR loci were highly polymorphic, while genetic diversity of cpSSRs was low. Nuclear SSRs displayed lower genetic differentiation among populations than cpSSR haplotypes (FST 0.11 and 0.95, respectively). However, high differentiation between phytogeographic provinces was observed in both genomes. The high genetic differentiation detected emphasizes the role of ancient fragmentation. However, the Paranaense province also shows the effects of recent fragmentation on genetic structure, whereas gene flow by pollen preserves the effects of genetic drift in the Yungas province.Centro Regional de Estudios Genómico

Servicio de Difusión de la Creación Intelectual