8,066 research outputs found

    Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes

    Get PDF
    "Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes

    Segmentally Variable Genes: A New Perspective on Adaptation

    Get PDF
    Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes suggests that genes are evolving at many different rates. We have attempted to derive a new classification of genes into three broad categories: lineage-specific genes that evolve rapidly and appear unique to individual species or strains; highly conserved genes that frequently perform housekeeping functions; and partially variable genes that contain highly variable regions, at least 70 amino acids long, interspersed among well-conserved regions. The latter we term segmentally variable genes (SVGs), and we suggest that they are especially interesting targets for biochemical studies. Among these genes are ones necessary to deal with the environment, including genes involved in host–pathogen interactions, defense mechanisms, and intracellular responses to internal and environmental changes. For the most part, the detailed function of these variable regions remains unknown. We propose that they are likely to perform important binding functions responsible for protein–protein, protein–nucleic acid, or protein–small molecule interactions. Discerning their function and identifying their binding partners may offer biologists new insights into the basic mechanisms of adaptation, context-dependent evolution, and the interaction between microbes and their environment. Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and adaptationNational Science Foundation (998088, 0239435

    A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression

    Get PDF
    Background: Psoroptic mange, caused by infestation with the ectoparasitic mite, Psoroptes ovis, is highly contagious, resulting in intense pruritus and represents a major welfare and economic concern for the livestock industry Worldwide. Control relies on injectable endectocides and organophosphate dips, but concerns over residues, environmental contamination, and the development of resistance threaten the sustainability of this approach, highlighting interest in alternative control methods. However, development of vaccines and identification of chemotherapeutic targets is hampered by the lack of P. ovis transcriptomic and genomic resources. Results: Building on the recent publication of the P. ovis draft genome, here we present a genomic analysis and transcriptomic atlas of gene expression in P. ovis revealing feeding- and stage-specific patterns of gene expression, including novel multigene families and allergens. Network-based clustering revealed 14 gene clusters demonstrating either single- or multi-stage specific gene expression patterns, with 3075 female-specific, 890 male-specific and 112, 217 and 526 transcripts showing larval, protonymph and tritonymph specific-expression, respectively. Detailed analysis of P. ovis allergens revealed stage-specific patterns of allergen gene expression, many of which were also enriched in "fed" mites and tritonymphs, highlighting an important feeding-related allergenicity in this developmental stage. Pair-wise analysis of differential expression between life-cycle stages identified patterns of sex-biased gene expression and also identified novel P. ovis multigene families including known allergens and novel genes with high levels of stage-specific expression. Conclusions: The genomic and transcriptomic atlas described here represents a unique resource for the acarid-research community, whilst the OrcAE platform makes this freely available, facilitating further community-led curation of the draft P. ovis genome

    The network of stabilizing contacts in proteins studied by coevolutionary data

    Full text link
    The primary structure of proteins, that is their sequence, represents one of the most abundant set of experimental data concerning biomolecules. The study of correlations in families of co--evolving proteins by means of an inverse Ising--model approach allows to obtain information on their native conformation. Following up on a recent development along this line, we optimize the algorithm to calculate effective energies between the residues, validating the approach both back-calculating interaction energies in a model system, and predicting the free energies associated to mutations in real systems. Making use of these effective energies, we study the networks of interactions which stabilizes the native conformation of some well--studied proteins, showing that it display different properties than the associated contact network

    Inferring Hierarchical Orthologous Groups

    Get PDF
    The reconstruction of ancestral evolutionary histories is the cornerstone of most phylogenetic analyses. Many applications are possible once the evolutionary history is unveiled, such as identifying taxonomically restricted genes (genome barcoding), predicting the function of unknown genes based on their evolutionary related genes gene ontologies, identifying gene losses and gene gains among gene families, or pinpointing the time in evolution where particular gene families emerge (sometimes referred to as “phylostratigraphy”). Typically, the reconstruction of the evolutionary histories is limited to the inference of evolutionary relationships (homology, orthology, paralogy) and basic clustering of these orthologs. In this thesis, we adopted the concept of Hierarchical Orthology Groups (HOGs), introduced a decade ago, and proposed several improvements both to improve their inference and to use them in biological analyses such as the aforementioned applications. In addition, HOGs are a powerful framework to investigate ancestral genomes since HOGs convey information regarding gene family evolution (gene losses, gene duplications or gene gains). In this thesis, an ancestral genome at a given taxonomic level denotes the last common ancestor genome for the related taxon and its hypothetical ancestral gene composition and gene order (synteny). The ancestral genes composition and ancestral synteny for a given ancestral genome provides valuable information to study the genome evolution in terms of genomic rearrangement (duplication, translocation, deletion, inversion) or of gene family evolution (variation of the gene function, accelerate gene evolution, duplication rich clade). This thesis identifies three major open challenges that composed my three research arcs. First, inferring HOGs is complex and computationally demanding meaning that robust and scalable algorithms are mandatory to generate good quality HOGs in a reasonable time. Second, benchmarking orthology clustering without knowing the true evolutionary history is a difficult task, which requires appropriate benchmark strategies. And third, the lack of tools to handle HOGs limits their applications. In the first arc of the thesis, I proposed two new algorithm refinements to improve orthology inference in order to produce orthologs less sensitive to gene fragmentations and imbalances in the rate of evolution among paralogous copies. In addition, I introduced version 2.0 of the GETHOGs 2.0 algorithm, which infers HOGs in a bottom up fashion, and which has been shown to be both faster and more accurate. In the second arc, I proposed new strategies to benchmark the reconstruction of gene families using detailed cases studies based on evidence from multiple sequence alignments along with reconstructed gene trees, and to benchmark orthology using a simulation framework that provides full control of the evolutionary genomic setup. This work highlights the main challenges in current methods. Third, I created pyHam (python HOG analysis method), iHam (interactive HOG analysis method) and GTM (Graph - Tree - Multiple sequence alignment)—a collection of tools to process, manipulate and visualise HOGs. pyHam offers an easy way to handle and work with HOGs using simple python coding. Embedded at its heart are two visualisation tools to synthesise HOG-derived information: iHam that allow interactive browsing of HOG structure and a tree based visualisation called tree profile that pinpoints evolutionary events induced by the HOGs on a species tree. In addition, I develop GTM an interactive web based visualisation tool that combine for a given gene family (or set of genes) the related sequences, gene tree and orthology graph. In this thesis, I show that HOGs are a useful framework for phylogenetics, with considerable work done to produce robust and scalable inferences. Another important aspect is that our inferences are benchmarked using manual case studies and automated verification using simulation or reference Quest for Orthologs Benchmarks. Lastly, one of the major advances was the conception and implementation of tools to manipulate and visualise HOG. Such tools have already proven useful when investigating HOGs for developmental reasons or for downstream analysis. Ultimately, the HOG framework is amenable to integration of all aspects which can reasonably be expected to have evolved along the history of genes and ancestral genome reconstruction. -- La reconstruction de l'histoire Ă©volutive ancestrale est la pierre angulaire de la majoritĂ© des analyses phylogĂ©nĂ©tiques. Nombreuses sont les applications possibles une fois que l'histoire Ă©volutive est rĂ©vĂ©lĂ©e, comme l'identification de gĂšnes restreints taxonomiquement (barcoding de gĂ©nome), la prĂ©diction de fonction pour les gĂšnes inconnus en se basant sur les ontologies des gĂšnes relatifs evolutionnairement, l'identification de la perte ou de l'apparition de gĂšnes au sein de familles de gĂšnes ou encore pour dater au cours de l'Ă©volution l'apparition de famille de gĂšnes (phylostratigraphie). GĂ©nĂ©ralement, la reconstruction de l'histoire Ă©volutive se limite Ă  l'infĂ©rence des relations Ă©volutives (homologie, orthologie, paralogie) ainsi qu'Ă  la construction de groupes d’orthologues simples. Dans cette thĂšse, nous adoptons le concept des groupes hiĂ©rarchiques d’orthologues (HOGs en anglais pour Hierarchical Orthology Groups), introduit il y a plus de 10 ans, et proposons plusieurs amĂ©liorations tant bien au niveau de leurs infĂ©rences que de leurs utilisations dans les analyses biologiques susmentionnĂ©es. Cette thĂšse a pour but d'identifier les trois problĂ©matiques majeures qui composent mes trois axes de recherches. PremiĂšrement, l'infĂ©rence des HOGs est complexe et nĂ©cessite une puissance computationnelle importante ce qui rend obligatoire la crĂ©ation d'algorithmes robustes et efficients dans l'espace temps afin de maintenir une gĂ©nĂ©ration de rĂ©sultats de qualitĂ© rigoureuse dans un temps raisonnable. DeuxiĂšmement, le contrĂŽle de la qualitĂ© du groupement des orthologues est une tĂąche difficile si on ne connaĂźt l'histoire Ă©volutive rĂ©elle ce qui nĂ©cessite la mise en place de stratĂ©gies de contrĂŽle de qualitĂ© adaptĂ©es. Tertio, le manque d'outils pour manipuler les HOGs limite leur utilisation ainsi que leurs applications. Dans le premier axe de ma thĂšse, je propose deux nouvelles amĂ©liorations de l'algorithme pour l'infĂ©rence des orthologues afin de pallier Ă  la sensibilitĂ© de l'infĂ©rence vis Ă  vis de la fragmentation des gĂšnes et de l'asymĂ©trie du taux d'Ă©volution au sein de paralogues. De plus, j'introduis la version 2.0 de l'algorithme GETHOGs qui utilise une nouvelle approche de type 'bottom-up' afin de produire des rĂ©sultats plus rapides et plus prĂ©cis. Dans le second axe, je propose de nouvelles stratĂ©gies pour contrĂŽler la qualitĂ© de la reconstruction des familles de gĂšnes en rĂ©alisant des Ă©tudes de cas manuels fondĂ©s sur des preuves apportĂ©es par des alignement multiples de sĂ©quences et des reconstructions d'arbres gĂ©niques, et aussi pour contrĂŽler la qualitĂ© de l'orthologie en simulant l'Ă©volution de gĂ©nomes afin de pouvoir contrĂŽler totalement le matĂ©riel gĂ©nĂ©tique produit. Ce travail met en avant les principales problĂ©matiques des mĂ©thodes actuelles. Dans le dernier axe, je montre pyHam, iHam et GTM - une panoplie d'outils que j’ai crĂ©Ă©e afin de faciliter la manipulation et la visualisation des HOGs en utilisant un programmation simple en python. Deux outils de visualisation sont directement intĂ©grĂ©s au sein de pyHam afin de pouvoir synthĂ©tiser l'information vĂ©hiculĂ©e par les HOGs: iHam permet d’interactivement naviguer dans les HOGs ainsi qu’une autre visualisation appelĂ©e “tree profile” utilisant un arbre d'espĂšces oĂč sont localisĂ©s les Ă©vĂ©nements rĂ©volutionnaires contenus dans les HOGs. En sus, j'ai dĂ©veloppĂ© GTM un outil interactif web qui combine pour une famille de gĂšnes donnĂ©e (ou un ensemble de gĂšnes) leurs sĂ©quences alignĂ©es, leur arbre de gĂšne ainsi que le graphe d'orthologie en relation. Dans cette thĂšse, je montre que le concept des HOGs est utile Ă  la phylogĂ©nĂ©tique et qu'un travail considĂ©rable a Ă©tĂ© rĂ©alisĂ© dans le but d'amĂ©liorer leur infĂ©rences de façon robuste et rapide. Un autre point important est que la qualitĂ© de nos infĂ©rences soit contrĂŽlĂ©e en rĂ©alisant des Ă©tudes de cas manuellement ou en utilisant le Quest for Orthologs Benchmark qui est une rĂ©fĂ©rence dans le contrĂŽle de la qualitĂ© de l’orthologie. DerniĂšrement, une des avancĂ©e majeure proposĂ©e est la conception et l'implĂ©mentation d'outils pour visualiser et manipuler les HOGs. Ces outils s'avĂšrent dĂ©jĂ  utilisĂ©s tant pour l'Ă©tude des HOGs dans un but d'amĂ©lioration de leur qualitĂ© que pour leur utilisation dans des analyses biologiques. Pour conclure, on peut noter que tous les aspects qui semblent avoir Ă©voluĂ© en relation avec l'histoire Ă©volutive des gĂšnes ou des gĂ©nomes ancestraux peuvent ĂȘtre intĂ©grĂ©s au concept des HOGs
    • 

    corecore