118 research outputs found

    Using Synteny in Phylogenomics Algorithms to Cluster Protein Sequences

    Get PDF
    With the rapid development of genome sequencing technologies, complete genomes are becoming more available and the need for computational methods for protein functional annotation is becoming more pressing. A long-standing problem in protein functional annotation is to distinguish orthologs from paralogs. Several academic efforts have recently emerged to automatically cluster proteins based on the premise that proteins in the same cluster are likely to have similar functions -- or are orthologs. The effectiveness of these protein clustering algorithms is fundamental for building accurate functional annotation pipelines. This dissertation first presents a study of the effectiveness of the similarity graph-based Markov CLuster algorithm (MCL) in detecting protein families and subfamilies when using it to cluster experimentally characterized enzymes from fungal genomes in the mycoCLAP database. Our study shows that the MCL algorithm successfully clusters proteins such that proteins in the same cluster always happen to be from the same family. However, in most cases, the MCL algorithm does not separate subfamilies. We evaluate the clusters with several cluster quality metrics, and show that these metrics can be used to spot outliers. This dissertation then introduces SynAPhy, a novel graph-based approach for clustering proteins by leveraging the global context of complete genomes for predicting functional similarity. SynAPhy integrates genomic neighborhood information into sequence similarity for better prediction of functionally similar protein clusters. It computes the ``syntenic reciprocal best hits" of proteins across genomes and uses this information to produce modified edge weight protein sequence similarity graphs. The similarity graphs are used as an input to the MCL algorithm to determine orthologous clusters across genomes. The results of applying SynAPhy on eight fungal genomes show that SynAPhy successfully generates clusters with more similar members than the MCL algorithm. However, there is no gold standard genome scale dataset to evaluate the capability of SynAPhy in generating orthologous clusters. We introduce SynAVal, an evaluation framework that can be applied on an orthology prediction technique. SynAVal first detects paralogs within each input genome, and then detects conserved connections between genomes that are highly likely orthologs using the synteny knowledge of SynAPhy. It uses these data to identify and report confusions raised by paralogs. The results of applying SynAVal on eight fungal genomes show that SynAVal with synteny resolution can successfully resolve potential confusions raised by 9.1\% of all the proteins of the eight fungal genomes, and 23.33\% of the subset of the proteins of the eight fungal genomes that are likely to raise confusions

    Genomics 4.0 : syntenic gene and genome duplication drives diversification of plant secondary metabolism and innate immunity in flowering plants : advanced pattern analytics in duplicate genomes

    Get PDF
    Genomics 4.0 - Syntenic Gene and Genome Duplication Drives Diversification of Plant Secondary Metabolism and Innate Immunity in Flowering Plants Johannes A. Hofberger1, 2, 3 1 Biosystematics Group, Wageningen University & Research Center, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands (August 2012 – December 2013) 2 Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands (December 2010 – July 2012) 3 Chinese Academy of Sciences/Max Planck Partner Institute for Computational Biology, 320 Yueyang Road, Shanghai 200031, PR China (January 2014 – December 2014) TWO-SENTENCE SUMMARY Large-scale comparative analysis of Big Data from next generation sequencing provides powerful means to exploit the potential of nature in context of plant breeding and biotechnology. In this thesis, we combine various computational methods for genome-wide identification of gene families involved in (a) plant innate immunity and (a) biosynthesis of defense-related plant secondary metabolites across 21 species, assess dynamics that affected evolution of underlying traits during 250 Million Years of flowering plant radiation and provide data on more than 4500 loci that can underpin crop improvement for future food and live quality. GENERAL ABSTRACT As sessile organisms, plants are permanently exposed to a plethora of potentially harmful microbes and other pests. The surprising resilience to infections observed in successful lineages is due to a complex defense network fighting off invading pathogens. Within this network, a sophisticated plant innate immune system is accompanied by a multitude of specialized biosynthetic pathways that generate more than 200,000 secondary metabolites with ecological, agricultural, energy and medicinal importance. The rapid diversification of associated genes was accompanied by a series of duplication events in virtually all plant species, including local duplication of short sequences as well as multiplication of all chromosomes due to meiotic errors (plant polyploidy). In a comparative genomics approach, we combined several bioinformatics techniques for large-scale identification of multi-domain and multi-gene families that are involved in plant innate immunity or defense-related secondary metabolite pathways across 21 representative flowering plant genomes. We introduced a framework to trace back duplicate gene copies to distinct ancient duplication events, thereby unravelling a differential impact of gene and genome duplication to molecular evolution of target genes. Comparing the genomic context among homologs within and between species in a phylogenomics perspective, we discovered orthologs conserved within genomic regions that remained structurally immobile during flowering plant radiation. In summary, we described a complex interplay of gene and genome duplication that increased genetic versatility of disease resistance and secondary metabolite pathways, thereby expanding the playground for functional diversification and thus plant trait innovation and success. Our findings give fascinating insights to evolution across lineages and can underpin crop improvement for food, fiber and biofuels production</p

    Dynamic evolution of the GnRH receptor gene family in vertebrates

    Get PDF

    Genome Evolution in the Salicaceae: Genetic Novelty, Horizontal Gene Transfer, and Comparative Genomics

    Get PDF
    Genome evolution is a powerful force which shapes genomes over time through processes like mutation, horizontal transfer, and sexual reproduction. Although questions which aim to explore genome evolution are broad, they are all understood through the discovery and comparison of genetic variation. For example, genetic diversity may explain differences in phenotypes, etiology of disease, and is essential for phylogenomic analysis. Recently, the democratization of next generation and third generation DNA sequencing technologies have allowed for genomics to produce large amounts of sequence data. This has facilitated the capture of genetic variation at species and population scales. Populus and Salix are members of the Salicaceae family and are ecologically and economically important woody plants. Currently, there are multiple high-quality reference genomes available for these two genera. Two important sources of genome evolution that will be explored here are genetic novelty in the form of new genes and horizontal gene transfer from the organelle genomes. In the context of genome evolution, both processes have been shown to contribute to beneficial phenotypes as well as disease. The primary contributions of this dissertation research are to identify and assign putative functions to orphan and de novo genes in P. trichocarpa, identify and compare horizontal transfer from the organelle genomes to the nuclear genomes of P. trichocarpa and P. deltoides, and generate new organelle genome resources for 6 different Salix species

    Fungal phylogenomics.A global analysis of fungal genomes and their evolution

    Get PDF
    Fungi is the eukaryotic group with a largest amount of completely sequenced species and therefore it is particularly well suited for comparative genomics analyses. A species tree is often an important part of phylogenomics analysis. Concern about its reliability led us to design several methods by which we could identify nodes in the species tree that were poorly supported by a whole phylome. We determined that the species tree was mostly well supported but some nodes showed large discrepancies to most genes.These results could partly be attributed to evolutionary events that result in topological changes in gene trees. Our analyses have shown that HGT plays an important role in fungal evolution. Gene duplications followed by differential loss are also often the cause of incongruence. The OXPHOS pathway, despite being formed by multi-protein complexes, has been affected by this process at similar levels than the rest of the genome.Els fongs són el grup d'espècies eucariotes amb un major nombre de genomes completament seqüenciats. Per això són un grup ideal on aplicar tècniques filogenòmiques. L'arbre de les espècies és un punt clau en molts anàlisis filogenòmics i com a tal necessitem saber si és fiable. Hem dissenyat diferents mesures que aprofiten la informació d'un filoma per identificar aquells punts en l'arbre de les especies que no estan ben suportats. Les discrepàncies que hem trobat poden ser degudes a successos evolutius (transferència horitzontal, duplicacions,...). Hem demostrat que la transferència horitzontal juga un paper important en l'evolució de fongs. També hem estudiat els efectes de duplicacions en l'evolució de la via metabòlica de la fosforilació oxidativa.Podem concloure que l'arbre de les especies és majoritàriament robust, però que necessitem ser capaços d'identificar nodes subjectes a variacions. Successos evolutius poden ser la causa de les discrepàncies observades en els arbres gènics

    The study of plant genome evolution by means of phylogenomics

    Get PDF

    Methods and analysis of genome-scale gene family evolution across multiple species

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 123-136).The fields of genomics and evolution have continually benefited from one another in their common goal of understanding the biological world. This partnership has been accelerated by ever increasing sequencing and high-throughput technologies. Although the future of genomic and evolutionary studies is bright, new models and methods will be needed to address the growing and changing challenges of large-scale datasets. In this work, I explore how evolution generates the diversity of life we see in modern species, specifically the evolution of new genes and functions. By reconstructing the history of the diverse sequences present in modern species, we can improve our understanding of their function and evolutionary importance. Performing such an analysis requires a principled and efficient means of computing the most probable evolutionary scenarios. To address these challenges, I introduce a new model of gene family evolution as well as a new method SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss rates, speciation times, and correlated substitution rate variation across both species and loci. I have implemented and applied this method on two clades of fully-sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies, and find dramatic improvements in reconstruction accuracy as compared to the most popular existing methods, including those that take the species tree into account. Lastly, I use the SPIMAP method to reconstruct the evolutionary history of all gene families in 16 fungal species including several relatives of the pathogenic species C. albicans. From these reconstructions, we identify several families enriched with duplications and positive selection in pathogenic lineages. Theses reconstructions shed light on the evolution of these species as well as a better understanding of the genes involved in pathogenicity.by Matthew D. Rasmussen.Ph.D

    A tale of two clades: genome evolution of oomycetes and fungi.

    Get PDF
    Some of the most ecologically-significant pathogens of plants, animals and marine life come from two groups of filamentous eukaryotes; the oomycetes and the fungi. Although similar in morphology and ecological niche, the two groups are only very-distantly related in terms of evolutionary history. The oomycetes are underresearched in evolutionary science, despite their historical and contemporary impact on food and environmental security. In contrast, fungi themselves are probably the most densely studied and sequenced group of organisms in evolutionary science outside of bacteria. This thesis is a collection of five published computational studies of the evolutionary biology of oomycetes and fungi. The first study is a systematic investigation of bacterial horizontal gene transfer into plant pathogenic oomycete species, which identifies 5 potential HGT events from prokaryotes into multiple oomycetes. The second study is a reconstruction of the evolutionary history of the oomycetes using wholegenome data from 37 species, which supports the larger groups within the oomycetes class but suggests that some exemplar oomycete genera are paraphyletic. Taking advantage of the abundance of genomics data available for all major fungal phyla, the third study reconstructs the evolutionary history of 84 fungal species using seven different phylogenomic techniques and critically evaluates each technique for accuracy, speed and other criteria. The fourth study looks at the pangenomes of four model fungal species, and compares the evolution of genomic variation, virulence and environmental adaptation within each species. The final study presents a refined iteration of the methodology used in the previous pangenome study as a self-contained software package and demonstrates the software’s capabilities through pangenome analysis and re-analysis of both model and non-model fungal species. Together, these studies cover a breadth of molecular evolution, comparative genomics, phylogenomics and pangenomics research for two similar, but evolutionarily-distinct groups of important microscopic eukaryotes
    • …
    corecore