341 research outputs found

    Polyhedral Covers of Tree Space

    Full text link
    The phylogenetic tree space, introduced by Billera, Holmes, and Vogtmann, is a cone over a simplicial complex. In this short article, we construct this complex from local gluings of classical polytopes, the associahedron and the permutohedron. Its homotopy is also reinterpreted and calculated based on polytope data.Comment: 8 pages, 9 figure

    Unbiased taxonomic annotation of metagenomic samples

    Get PDF
    The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under thereceiver operating characteristic (ROC) curve andF-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample.Peer ReviewedPostprint (published version

    Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata

    Full text link
    Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content. However, learning from social metadata presents several challenges, since it is sparse, shallow, ambiguous, noisy, and inconsistent. We describe an approach to folksonomy learning based on relational clustering, which exploits structured metadata contained in personal hierarchies. Our approach clusters similar hierarchies using their structure and tag statistics, then incrementally weaves them into a deeper, bushier tree. We study folksonomy learning using social metadata extracted from the photo-sharing site Flickr, and demonstrate that the proposed approach addresses the challenges. Moreover, comparing to previous work, the approach produces larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD) 201

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/logL)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Thirteen new species of butterflies (Lepidoptera: Hesperiidae) from Texas

    Get PDF
    Analyses of whole genomic shotgun datasets, COI barcodes, morphology, and historical literature suggest that the following 13 butterfly species from the family Hesperiidae (Lepidoptera: Papilionoidea) in Texas, USA are distinct from their closest named relatives and therefore are described as new (type localities are given in parenthesis): Spicauda atelis Grishin, new species (Hidalgo Co., Mission), Urbanus (Urbanus) rickardi Grishin, new species (Hidalgo Co., nr. Madero), Urbanus (Urbanus) oplerorum Grishin, new spe­cies (Hidalgo Co., Mission/Madero), Telegonus tsongae Grishin, new species (Starr Co., Roma), Autochton caballo Grishin, new species (Hidalgo Co., 6 mi W of Hidalgo), Epargyreus fractigutta Grishin, new species (Hidalgo Co., McAllen), Aguna mcguirei Grishin, new species (Cameron Co., Brownsville), Polygonus par­dus Grishin, new species (Hidalgo Co., McAllen), Arteurotia artistella Grishin, new species (Hidalgo Co., Mission), Heliopetes elonmuski Grishin, new species (Cameron Co., Boca Chica), Hesperia balcones Grishin, new species (Travis Co., Volente), Troyus fabulosus Grishin, new species (Hidalgo Co., Peñitas), and Le­rema ochrius Grishin, new species (Hidalgo Co., nr. Relampago). Most of these species are known in the US almost exclusively from the Lower Rio Grande Valley in Texas. Nine of the holotypes were collected in 1971-1975, a banner period for butterfly species newly recorded from the Rio Grande Valley of Texas; five of them collected by William W. McGuire, and one by Nadine M. McGuire. At the time, these new species have been recorded under the names of their close relatives. A Neotype is designated for Papilio fulminator Sepp, [1841] (Suriname). Lectotypes are designated for Goniurus teleus Hübner, 1821 (unknown, likely in South America), Goniloba azul Reakirt, [1867] (Mexico: Veracruz) and Eudamus misitra Plötz, 1881 (Mex­ico). Several taxonomic changes are proposed. The following taxa are species (not subspecies): Spicauda zalanthus (Plötz, 1880), reinstated status (not Spicauda teleus (Hübner, 1821)), Telegonus fulminator (Sepp, [1841]), reinstated status (not Telegonus fulgerator (Walch, 1775), Telegonus misitra (Plötz, 1881), reinstated status (not Telegonus azul (Reakirt, [1867])), Autochton reducta (Mabille and Boullet, 1919), new status (not Autochton potrillo (Lucas, 1857)), Epargyreus gaumeri Godman and Salvin, 1893, reinstated status (not Epargyreus clavicornis (Herrich-Schäffer, 1869)), and Polygonus punctus E. Bell and W. Comstock, 1948, new status (not Polygonus savigny (Latreille, [1824])). Urbanus ehakernae Burns, 2014 and Epargyreus socus chota Evans, 1952 are junior subjective synonyms of Urbanus alva Evans, 1952 and Epargyreus clavicornis (Herrich-Schäffer, 1869), respectively, and Epargyreus gaumeri tenda Evans, 1955, new combination is not a subspecies of E. clavicornis

    A list of parameterized problems in bioinformatics

    Get PDF
    In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version

    The generalized Robinson-Foulds distance for phylogenetic trees

    Get PDF
    The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.This research was partially supported by the Spanish Ministry of Science, Innovation and Universitiesand the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) throughgrant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version

    Aspergillus sydowii and other potential fungal pathogens in Gorgonian Octocorals of the Ecuadorian Pacific

    Get PDF
    Emerging fungal diseases are threatening ecosystems and have increased in recent decades. In corals, the prevalence and consequences of these infections have also increased in frequency and severity. Coral reefs are affected by an emerging fungal disease named aspergillosis, caused by Aspergillus sydowii. This disease and its pathogen have been reported along the Caribbean and Pacific coasts of Colombia. Despite this, an important number of coral reefs worldwide have not been investigated for the presence of this pathogen. In this work, we carried out the surveillance of the main coral reef of the Ecuadorian Pacific with a focus on the two most abundant and cosmopolitan species of this ecosystem, Leptogorgia sp. and Leptogorgia obscura. We collected 59 isolates and obtained the corresponding sequences of the Internal Transcribed Spacers (ITS) of the ribosomal DNA. These were phylogenetically analyzed using MrBayes, which indicated the presence of two isolates of the coral reef pathogen A. sydowii, as well as 16 additional species that are potentially pathogenic to corals. Although the analyzed gorgonian specimens appeared healthy, the presence of these pathogens, especially of A. sydowii, alert us to the potential risk to the health and future survival of the Pacific Ecuadorian coral ecosystem under the current scenario of increasing threats and stressors to coral reefs, such as habitat alterations by humans and global climate change.This research was only partially supported by a grant from the Spanish Ministry of Economy and Competitiveness (CTM2014-57949-R).Peer Reviewe

    Phylogenetics of Parapanteles (Braconidae: Microgastrinae) wasps, an underused tool for their identification, and an exploration of the evolution of their symbiotic viruses

    Get PDF
    Microgastrinae is the most diverse subfamily of Braconidae, one of the largest families of parasitoid wasps. Microgastrines parasitize nearly all families of Lepidoptera, but the majority of species are only known to attack one or two Lepidoptera species. Microgastrinae is diverse and much of this diversity arose during a still poorly-understood ancient rapid radiation, causing many short branches deep in the microgastrine phylogeny that are difficult to reconstruct. Due to these difficulties, many microgastrine genera, especially the more specious genera, may not be monophyletic and their placements within the microgastrine phylogeny are ambiguous. In Chapter 2, I constructed a 5-gene molecular phylogeny to assess the monophyly of the genus Parapanteles Ashmead (Braconidae: Microgastrinae), a medium-sized genus of microgastrine wasps that was first defined over a century ago, lacks a unique synapomorphic character, and its monophyly has not been adequately tested. Parapanteles larvae parasitize large, unconcealed caterpillars (macrolepidoptera) and have been reared from an unusually large diversity of hosts for a relatively small parasitoid genus. I used the extensive existing Cytochrome Oxidase I sequences plus four additional genes (wingless, elongation factor 1-alpha, ribosomal subunit 28s, and NADH dehydrogenase subunit 1) to construct individual gene trees and concatenated Bayesian and maximum-likelihood phylogenies of Parapanteles species and several species from other microgastrine genera. In these phylogenies, a plurality of Parapanteles species were recovered as a monophyletic group within another genus, Dolichogenidea, while the remaining Parapanteles species were highly polyphyletic. In Chapter 3, I describe and assess the usefulness of the wing interference patterns of a monophyletic clade of Parapanteles wasps discovered in Chapter 2 for aiding in species identification. Wing interference patterns (WIPs) are color patterns of insect wings caused by thin film interference. We were able to detect consistent WIP differences between Parapanteles species. In some cases, WIPs can be used to diagnose sibling species that would otherwise require SEM images. The species-specific patterns of WIPs are diagnostically valuable but of uncertain evolutionary significance. In Chapter 4 I used an anchored phylogenomics approach to address intergeneric relationships in Microgastrinae more broadly. Previous molecular phylogenies of this taxon have consistently recovered many short and poorly supported basal internal nodes, supporting the hypothesis that Microgastrinae coevolved with their hosts in an ancient rapid speciation event. The systematics of the 64 currently recognized extant genera are still poorly resolved and the monophyly of many of these genera is questionable. To address these challenges, I selected 89 species, broadly from within and across several microgastrine genera, and Drs. Emily and Alan Lemmon at Florida State University performed anchored hybrid enrichment to generate 370 gene fragment sequences for each. Drs. Emily and Alan Lemmon made a concatenated maximum-likelihood analysis of this dataset with RAxML which resolved nearly all nodes with high bootstrap support. This phylogeny supports several larger genera (Apanteles, Cotesia, Dolihcogenidea, and Glyptapanteles) as mostly monophyletic, although taxa from smaller, rarer genera are recovered within each. It also corroborates previous results that Parapanteles is a polyphyletic genus composed of several subclades of disparate genera, although most are within Dolichogenidea. Microgastrinae wasps have symbiotic viruses, known as polydnaviruses, encoded within their nuclear genomes that females produce and inject, along with eggs, into their host caterpillars. In Chapter 5 I sequenced the genomes of 16 microgastrine species from a monophyletic clade of Parapanteles Ashmead with extensive host-use records, and annotated polydnavirus genes in each genome. I found that probable duplications, pseudogenes, and rearrangements are common, especially in the protein-tyrosine-phosphatase polydnavirus gene family. These results support the model that frequent gene births and deaths are a major factor in polydnavirus genome evolution, and extend our knowledge of polydnaviruses to a major previously unexplored segment of the microgastrine phylogeny

    Greedy Trees, Subtrees and Antichains

    Get PDF
    Greedy trees are constructed from a given degree sequence by a simple greedy algorithm that assigns the highest degree to the root, the second-, third-, ... highest degrees to the root\u27s neighbors, and so on. They have been shown to maximize or minimize a number of different graph invariants among trees with a given degree sequence. In particular, the total number of subtrees of a tree is maximized by the greedy tree. In this work, we show that in fact a much stronger statement holds true: greedy trees maximize the number of subtrees of any given order. This parallels recent results on distance-based graph invariants. We obtain a number of corollaries from this fact and also prove analogous results for related invariants, most notably the number of antichains of given cardinality in a rooted tree
    corecore