77 research outputs found

    iPhy: an integrated phylogenetic workbench for supermatrix analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.</p> <p>Results</p> <p>Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users.</p> <p>Conclusions</p> <p>iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.</p

    Homoplastic microinversions and the avian tree of life

    Get PDF
    Background: Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasyfree evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. Results: We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially more microinversions than expected based upon prior information about vertebrate inversion rates, although this is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or united well-accepted groups. However, some homoplastic microinversions were evident among the informative characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor detectable sequence motifs were associated with microinversions in the hotspots. Conclusions: Microinversions can provide valuable phylogenetic information, although power analysis indicate

    Mitogenomic phylogenetic analyses of the Delphinidae with an emphasis on the Globicephalinae

    Get PDF
    BACKGROUND: Previous DNA-based phylogenetic studies of the Delphinidae family suggest it has undergone rapid diversification, as characterised by unresolved and poorly supported taxonomic relationships (polytomies) for some of the species within this group. Using an increased amount of sequence data we test between alternative hypotheses of soft polytomies caused by rapid speciation, slow evolutionary rate and/or insufficient sequence data, and hard polytomies caused by simultaneous speciation within this family. Combining the mitogenome sequences of five new and 12 previously published species within the Delphinidae, we used Bayesian and maximum-likelihood methods to estimate the phylogeny from partitioned and unpartitioned mitogenome sequences. Further ad hoc tests were then conducted to estimate the support for alternative topologies. RESULTS: We found high support for all the relationships within our reconstructed phylogenies, and topologies were consistent between the Bayesian and maximum-likelihood trees inferred from partitioned and unpartitioned data. Resolved relationships included the placement of the killer whale (Orcinus orca) as sister taxon to the rest of the Globicephalinae subfamily, placement of the Risso's dolphin (Grampus griseus) within the Globicephalinae subfamily, removal of the white-beaked dolphin (Lagenorhynchus albirostris) from the Delphininae subfamily and the placement of the rough-toothed dolphin (Steno bredanensis) as sister taxon to the rest of the Delphininae subfamily rather than within the Globicephalinae subfamily. The additional testing of alternative topologies allowed us to reject all other putative relationships, with the exception that we were unable to reject the hypothesis that the relationship between L. albirostris and the Globicephalinae and Delphininae subfamilies was polytomic. CONCLUSION: Despite their rapid diversification, the increased sequence data yielded by mitogenomes enables the resolution of a strongly supported, bifurcating phylogeny, and a chronology of the divergences within the Delphinidae family. This highlights the benefits and potential application of large mitogenome datasets to resolve long-standing phylogenetic uncertainties

    Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The explosion in availability of whole genome data provides the opportunity to build phylogenetic hypotheses based on these data as well as the ability to learn more about the genomes themselves. The biological history of genes and genomes can be investigated based on the taxomonic history provided by the phylogeny. A phylogenetic hypothesis based on complete genome data is presented for the genus <it>Shewanella </it>(Gammaproteobacteria: Alteromonadales: Shewanellaceae). Nineteen taxa from <it>Shewanella </it>(16 species and 3 additional strains of one species) as well as three outgroup species representing the genera <it>Aeromonas </it>(Gammaproteobacteria: Aeromonadales: Aeromonadaceae), <it>Alteromonas </it>(Gammaproteobacteria: Alteromonadales: Alteromonadaceae) and <it>Colwellia </it>(Gammaproteobacteria: Alteromonadales: Colwelliaceae) are included for a total of 22 taxa.</p> <p>Results</p> <p>Putatively homologous regions were found across unannotated genomes and tested with a phylogenetic analysis. Two genome-wide data-sets are considered, one including only those genomic regions for which all taxa are represented, which included 3,361,015 aligned nucleotide base-pairs (bp) and a second that additionally includes those regions present in only subsets of taxa, which totaled 12,456,624 aligned bp. Alignment columns in these large data-sets were then randomly sampled to create smaller data-sets. After the phylogenetic hypothesis was generated, genome annotations were projected onto the DNA sequence alignment to compare the historical hypothesis generated by the phylogeny with the functional hypothesis posited by annotation.</p> <p>Conclusions</p> <p>Individual phylogenetic analyses of the 243 locally co-linear genome regions all failed to recover the genome topology, but the smaller data-sets that were random samplings of the large concatenated alignments all produced the genome topology. It is shown that there is not a single orthologous copy of 16S rRNA across the taxon sampling included in this study and that the relationships among the multiple copies are consistent with 16S rRNA undergoing concerted evolution. Unannotated whole genome data can provide excellent raw material for generating hypotheses of historical homology, which can be tested with phylogenetic analysis and compared with hypotheses of gene function.</p

    Acoelomorpha: earliest branching bilaterians or deuterostomes?

    Get PDF
    The Acoelomorpha is an animal group comprised by nearly 400 species of misleadingly inconspicuous flatworms. Despite this, acoelomorphs have been at the centre of a heated debate about the origin of bilaterian animals for 150 years. The animal tree of life has undergone major changes during the last decades, thanks largely to the advent of molecular data together with the development of more rigorous phylogenetic methods. There is now a relatively robust backbone of the animal tree of life. However, some crucial nodes remain contentious, especially the node defining the root of Bilateria. Some studies situate Acoelomorpha (and Xenoturbellida) as the sister group of all other bilaterians, while other analyses group them within the deuterostomes which instead suggests that the last common bilaterian ancestor directly gave rise to deuterostomes and protostomes. The resolution of this node will have a profound impact on our understanding of animal/bilaterian evolution. In particular, if acoelomorphs are the sister group to Bilateria, it will point to a simple nature for the first bilaterian. Alternatively, if acoelomorphs are deuterostomes, this will imply that they are the result of secondary simplification. Here, we review the state of this question and provide potential ways to solve this long-standing issue. Specifically, we argue for the benefits of (1) obtaining additional genomic data from acoelomorphs, in particular from taxa with slower evolutionary rates; (2) the development of new tools to analyse the data; and (3) the use of metagenomics or metatranscriptomics data. We believe the combination of these three approaches will provide a definitive answer as to the position of the acoelomorphs in the animal tree of life

    Comparative Genomics of Mycoplasma: Analysis of Conserved Essential Genes and Diversity of the Pan-Genome

    Get PDF
    Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages

    De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (<it>Pteridium aquilinum</it>) to develop genomic resources for evolutionary studies.</p> <p>Results</p> <p>681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled <it>de novo </it>into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0×. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of <it>Arabidopsis</it>, <it>Selaginella </it>and <it>Physcomitrella</it>, and identified a substantial number of potentially novel fern genes. By comparing the list of <it>Arabidopsis </it>genes identified by blast with a list of gametophyte-specific <it>Arabidopsis </it>genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.</p> <p>Conclusions</p> <p>This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for <it>de novo </it>transcriptome characterization and gene discovery in a non-model plant.</p

    Universal Artifacts Affect the Branching of Phylogenetic Trees, Not Universal Scaling Laws

    Get PDF
    The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny?In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods.All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts for tree shape convergence of large trees.There is no evidence for any universal scaling in the tree of life. Instead, there is a need for improved methods of tree analysis that can be used to discriminate the noise due to outgroups from the phylogenetic signal within the taxon of interest, and to evaluate realistic models of evolution, correcting the retrospective perspective and explicitly recognizing extinction as a driving force. Artifacts are pervasive, and can only be overcome through understanding the structure and biological meaning of phylogenetic trees. Catalan Abstract in Translation S1

    Structural Alterations in a Component of Cytochrome c Oxidase and Molecular Evolution of Pathogenic Neisseria in Humans

    Get PDF
    Three closely related bacterial species within the genus Neisseria are of importance to human disease and health. Neisseria meningitidis is a major cause of meningitis, while Neisseria gonorrhoeae is the agent of the sexually transmitted disease gonorrhea and Neisseria lactamica is a common, harmless commensal of children. Comparative genomics have yet to yield clear insights into which factors dictate the unique host-parasite relationships exhibited by each since, as a group, they display remarkable conservation at the levels of nucleotide sequence, gene content and synteny. Here, we discovered two rare alterations in the gene encoding the CcoP protein component of cytochrome cbb3 oxidase that are phylogenetically informative. One is a single nucleotide polymorphism resulting in CcoP truncation that acts as a molecular signature for the species N. meningitidis. We go on to show that the ancestral ccoP gene arose by a unique gene duplication and fusion event and is specifically and completely distributed within species of the genus Neisseria. Surprisingly, we found that strains engineered to express either of the two CcoP forms conditionally differed in their capacity to support nitrite-dependent, microaerobic growth mediated by NirK, a nitrite reductase. Thus, we propose that changes in CcoP domain architecture and ensuing alterations in function are key traits in successive, adaptive radiations within these metapopulations. These findings provide a dramatic example of how rare changes in core metabolic proteins can be connected to significant macroevolutionary shifts. They also show how evolutionary change at the molecular level can be linked to metabolic innovation and its reversal as well as demonstrating how genotype can be used to infer alterations of the fitness landscape within a single host
    corecore