12,843 research outputs found

    The Mathematics of Phylogenomics

    Get PDF
    The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure

    A target enrichment method for gathering phylogenetic information from hundreds of loci: An example from the Compositae.

    Get PDF
    UnlabelledPremise of the studyThe Compositae (Asteraceae) are a large and diverse family of plants, and the most comprehensive phylogeny to date is a meta-tree based on 10 chloroplast loci that has several major unresolved nodes. We describe the development of an approach that enables the rapid sequencing of large numbers of orthologous nuclear loci to facilitate efficient phylogenomic analyses. •Methods and resultsWe designed a set of sequence capture probes that target conserved orthologous sequences in the Compositae. We also developed a bioinformatic and phylogenetic workflow for processing and analyzing the resulting data. Application of our approach to 15 species from across the Compositae resulted in the production of phylogenetically informative sequence data from 763 loci and the successful reconstruction of known phylogenetic relationships across the family. •ConclusionsThese methods should be of great use to members of the broader Compositae community, and the general approach should also be of use to researchers studying other families

    Comparative Phylogenomics of Pathogenic and Nonpathogenic Species.

    Get PDF
    The Ascomycete Onygenales order embraces a diverse group of mammalian pathogens, including the yeast-forming dimorphic fungal pathogens Histoplasma capsulatum, Paracoccidioides spp. and Blastomyces dermatitidis, the dermatophytes Microsporum spp. and Trichopyton spp., the spherule-forming dimorphic fungal pathogens in the genus Coccidioides, and many nonpathogens. Although genomes for all of the aforementioned pathogenic species are available, only one nonpathogen had been sequenced. Here, we enhance comparative phylogenomics in Onygenales by adding genomes for Amauroascus mutatus, Amauroascus niger, Byssoonygena ceratinophila, and Chrysosporium queenslandicum--four nonpathogenic Onygenales species, all of which are more closely related to Coccidioides spp. than any other known Onygenales species. Phylogenomic detection of gene family expansion and contraction can provide clues to fungal function but is sensitive to taxon sampling. By adding additional nonpathogens, we show that LysM domain-containing proteins, previously thought to be expanding in some Onygenales, are contracting in the Coccidioides-Uncinocarpus clade, as are the self-nonself recognition Het loci. The denser genome sampling presented here highlights nearly 800 genes unique to Coccidiodes, which have significantly fewer known protein domains and show increased expression in the endosporulating spherule, the parasitic phase unique to Coccidioides spp. These genomes provide insight to gene family expansion/contraction and patterns of individual gene gain/loss in this diverse order--both major drivers of evolutionary change. Our results suggest that gene family expansion/contraction can lead to adaptive radiations that create taxonomic orders, while individual gene gain/loss likely plays a more significant role in branch-specific phenotypic changes that lead to adaptation for species or genera

    HMMER cut-off threshold tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

    Get PDF
    Background: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results: HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions: HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following.Fil: Pagnuco, Inti Anabela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica; ArgentinaFil: Revuelta, María Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Bondino, Hernán Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Brun, Marcel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica; ArgentinaFil: Ten Have, Arjen. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; Argentin

    Phylogenomics and analysis of shared genes suggest a single transition to mutualism in Wolbachia of nematodes

    Get PDF
    Wolbachia, endosymbiotic bacteria of the order Rickettsiales, are widespread in arthropods but also present in nematodes. In arthropods, A and B supergroup Wolbachia are generally associated with distortion of host reproduction. In filarial nematodes, including some human parasites, multiple lines of experimental evidence indicate that C and D supergroup Wolbachia are essential for the survival of the host, and here the symbiotic relationship is considered mutualistic. The origin of this mutualistic endosymbiosis is of interest for both basic and applied reasons: How does a parasite become a mutualist? Could intervention in the mutualism aid in treatment of human disease? Correct rooting and high-quality resolution of Wolbachia relationships are required to resolve this question. However, because of the large genetic distance between Wolbachia and the nearest outgroups, and the limited number of genomes so far available for large-scale analyses, current phylogenies do not provide robust answers. We therefore sequenced the genome of the D supergroup Wolbachia endosymbiont of Litomosoides sigmodontis, revisited the selection of loci for phylogenomic analyses, and performed a phylogenomic analysis including available complete genomes (from isolates in supergroups A, B, C, and D). Using 90 orthologous genes with reliable phylogenetic signals, we obtained a robust phylogenetic reconstruction, including a highly supported root to the Wolbachia phylogeny between a (A + B) clade and a (C + D) clade. Although we currently lack data from several Wolbachia supergroups, notably F, our analysis supports a model wherein the putatively mutualist endosymbiotic relationship between Wolbachia and nematodes originated from a single transition event

    A genomic approach to examine the complex evolution of laurasiatherian mammals

    Get PDF
    Recent phylogenomic studies have failed to conclusively resolve certain branches of the placental mammalian tree, despite the evolutionary analysis of genomic data from 32 species. Previous analyses of single genes and retroposon insertion data yielded support for different phylogenetic scenarios for the most basal divergences. The results indicated that some mammalian divergences were best interpreted not as a single bifurcating tree, but as an evolutionary network. In these studies the relationships among some orders of the super-clade Laurasiatheria were poorly supported, albeit not studied in detail. Therefore, 4775 protein-coding genes (6,196,263 nucleotides) were collected and aligned in order to analyze the evolution of this clade. Additionally, over 200,000 introns were screened in silico, resulting in 32 phylogenetically informative long interspersed nuclear elements (LINE) insertion events. The present study shows that the genome evolution of Laurasiatheria may best be understood as an evolutionary network. Thus, contrary to the common expectation to resolve major evolutionary events as a bifurcating tree, genome analyses unveil complex speciation processes even in deep mammalian divergences. We exemplify this on a subset of 1159 suitable genes that have individual histories, most likely due to incomplete lineage sorting or introgression, processes that can make the genealogy of mammalian genomes complex. These unexpected results have major implications for the understanding of evolution in general, because the evolution of even some higher level taxa such as mammalian orders may sometimes not be interpreted as a simple bifurcating pattern

    Tackling Rapid Radiations With Targeted Sequencing

    Get PDF
    In phylogenetic studies across angiosperms, at various taxonomic levels, polytomies have persisted despite efforts to resolve them by increasing sampling of taxa and loci. The large amount of genomic data now available and statistical tools to analyze them provide unprecedented power for phylogenetic inference. Targeted sequencing has emerged as a strong tool for estimating species trees in the face of rapid radiations, lineage sorting, and introgression. Evolutionary relationships in Cyperaceae have been studied mostly using Sanger sequencing until recently. Despite ample taxon sampling, relationships in many genera remain poorly understood, hampered by diversification rates that outpace mutation rates in the loci used. The C4 Cyperus clade of the genus Cyperus has been particularly difficult to resolve. Previous studies based on a limited set of markers resolved relationships among Cyperus species using the C3 photosynthetic pathway, but not among C4 Cyperus clade taxa. We test the ability of two targeted sequencing kits to resolve relationships in the C4 Cyperus clade, the universal Angiosperms-353 kit and a Cyperaceae-specific kit. Sequences of the targeted loci were recovered from data generated with both kits and used to investigate overlap in data between kits and relative efficiency of the general and custom approaches. The power to resolve shallow-level relationships was tested using a summary species tree method and a concatenated maximum likelihood approach. High resolution and support are obtained using both approaches, but high levels of missing data disproportionately impact the latter. Targeted sequencing provides new insights into the evolution of morphology in the C4 Cyperus clade, demonstrating for example that the former segregate genus Alinula is polyphyletic despite its seeming morphological integrity. An unexpected result is that the Cyperus margaritaceus-Cyperus niveus complex comprises a clade separate from and sister to the core C4 Cyperus clade. Our results demonstrate that data generated with a family-specific kit do not necessarily have more power than those obtained with a universal kit, but that data generated with different targeted sequencing kits can often be merged for downstream analyses. Moreover, our study contributes to the growing consensus that targeted sequencing data are a powerful tool in resolving rapid radiationsEspaña Ministry of Economy and Competitiveness (project CGL2016- 77401-P
    • …
    corecore