1,032 research outputs found

    Identifying single copy orthologs in Metazoa

    Get PDF
    The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene. Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence. To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived

    PLoS Comput Biol

    Get PDF
    The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived

    Horizontal gene transfer and the unusual genomic architecture of bdelloid rotifers

    Get PDF
    Bdelloid rotifers are microscopic aquatic animals, notable for their ancient asexuality and their extreme desiccation tolerance. In the absence of sexual reproduction, bdelloids have persisted for over 40 million years, diverging into >450 morphologically distinct species. Despite the two-fold cost of sex, asexual lineages tend to be short-lived and species poor. Many theories exist to explain the success of sexual reproduction, and in the light of these, ancient asexual lineages are an evolutionary paradox. Understanding the persistence and speciation of ancient asexuals may provide clues to factors underlying the success of sexual reproduction. Bdelloid rotifers have unusual genomic features that may have provided some compensation for their long-term absence of sexual reproduction. Here I focus on two: multiple gene copies and horizontal gene transfer (HGT). Bdelloids have multiple copies of many genes, and are considered degenerate tetraploids. In genomes influenced by the opposing forces of gene conversion and divergence of former alleles, I examine the relationships between, and biochemical implications of divergence of a multi-gene family of alpha tubulin. Horizontally acquired genes were initially identified in sub-telomeric regions of two species of bdelloid rotifer. In order to understand what role foreign genes might have played in bdelloid evolution we need to examine the extent, frequency and mechanism of HGT. Here I develop a bioinformatics pipeline for identifying horizontally acquired genes in transcriptomes. By comparing HGT in a number of bdelloid species I demonstrate that the majority of transcribed foreign genes were acquired before the divergence of extant bdelloid species, but the presence of more recently acquired genes implies that HGT is ongoing. By comparing the extent of HGT in closely related species with different desiccation frequencies I provide initial support for the hypothesis that bdelloid HGT is facilitated by DNA breakage and repair during cycles of desiccation and rehydration.Open Acces

    BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics.

    Get PDF
    Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics

    Gene content evolution in the arthropods

    Get PDF
    Arthropods comprise the largest and most diverse phylum on Earth and play vital roles in nearly every ecosystem. Their diversity stems in part from variations on a conserved body plan, resulting from and recorded in adaptive changes in the genome. Dissection of the genomic record of sequence change enables broad questions regarding genome evolution to be addressed, even across hyper-diverse taxa within arthropods. Using 76 whole genome sequences representing 21 orders spanning more than 500 million years of arthropod evolution, we document changes in gene and protein domain content and provide temporal and phylogenetic context for interpreting these innovations. We identify many novel gene families that arose early in the evolution of arthropods and during the diversification of insects into modern orders. We reveal unexpected variation in patterns of DNA methylation across arthropods and examples of gene family and protein domain evolution coincident with the appearance of notable phenotypic and physiological adaptations such as flight, metamorphosis, sociality, and chemoperception. These analyses demonstrate how large-scale comparative genomics can provide broad new insights into the genotype to phenotype map and generate testable hypotheses about the evolution of animal diversity

    Systematic errors in phylogenomics with a focus on the major metazoan clade Deuterostomia

    Get PDF
    Modern-day phylogenomics studies employ large data sets of many genes to resolve evolutionary relationships among many species. A typical phylogenomic workflow consists of certain steps: taxon sampling, orthology inference, marker selection and tree search. All of these steps contain some subjective decisions made by the researcher, posing risks for introducing systematic errors in the final results. In this thesis, I investigate the source and the impact of systematic errors in multiple steps of the phylogenomic workflow, focusing on the major clade Metazoa. First, I create simulated sets of orthologs under different settings for evolutionary rate and rate heterogeneity among sites and use OrthoFinder to infer their (known) orthology relationships. Orthology inference is sensitive to high evolutionary rates and low rate heterogeneity among sites. I show that errors in orthology inference are carried over to downstream analysis such as gene presence/absence phylogenies, gene gains/losses inference and phylostratigraphy. I also introduce a novel computational pipeline which allows us to identify the presence of a hidden break in the 28S ribosomal RNA of a given species. Mapping RNA-seq reads onto the 28S rRNA sequence reveals non-existent coverage of mapped reads near the middle of the 28S rRNA sequence of species that possess the hidden break. I apply this pipeline in hundreds of metazoan and other eukaryotic species and find that the hidden break is a rarely lost protostome feature, with surprising events of convergent evolution outside Metazoa. I finally focus on the major metazoan clade of Deuterostomia; while it has been widely accepted as a monophyletic group for over a century, recent phylogenomic studies addressing known systematic errors have recovered low support for monophyletic Deuterostomia. I examine five recently published metazoan phylogenomic data sets to show that monophyletic Deuterostomia is much less well supported than monophyletic Protostomia. I also create 40 new data sets, with and without fast-evolving taxa, and use them to correlate strong support for monophyletic Deuterostomia with problematic conditions in a phylogenomic analysis

    Establishing the precise evolutionary history of a gene improves prediction of disease-causing missense mutations

    Get PDF
    PURPOSE: Predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. METHODS: We identified major events in NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism’s fitness. RESULTS: Removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. CONCLUSION: The results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well. Genet Med 18 10, 1029–1036

    Origin and Evolution of Dishevelled

    Get PDF
    Dishevelled (Dsh or Dvl) is an important signaling protein, playing a key role in Wnt signaling and relaying cellular information for several developmental pathways. Dsh is highly conserved among metazoans and has expanded into a multigene family in most bilaterian lineages, including vertebrates, planarians, and nematodes. These orthologs, where explored, are known to have considerable overlap in function, but evidence for functional specialization continues to mount. We performed a comparative analysis of Dsh across animals to explore protein architecture and identify conserved and divergent features that could provide insight into functional specialization with an emphasis on invertebrates, especially nematodes. We find evidence of dynamic evolution of Dsh, particularly among nematodes, with taxa varying in ortholog number from one to three. We identify a new domain specific to some nematode lineages and find an unexpected nuclear localization signal conserved in many Dsh orthologs. Our findings raise questions of protein evolution in general and provide clues as to how animals have dealt with the complex intricacies of having a protein, such as Dsh, act as a central messenger hub connected to many different and vitally important pathways. We discuss our findings in the context of functional specialization and bring many testable hypotheses to light

    SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics

    Get PDF
    BACKGROUND: Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context. RESULTS: Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise. CONCLUSION: SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species
    corecore