8 research outputs found

    Experiment-based computational method for proper annotation of the molecular function of enzymes

    Get PDF
    The rate of protein functional elucidation lags far behind the rate of gene and protein sequence discovery, leading to an accumulation of proteins with no known function. Millions of protein database entries are not assigned reliable functions, preventing the full understanding of chemical diver­sity in living organisms. Pfam contains over 16,712 families, among which more than 3,919 families are of unknown function (DUF families). An additional difficulty, often underestimated, is that only a tiny fraction of enzymes have experimentally established functions and in most cases, function is extrapolated from a small number of characterized proteins to all members of a family leading to over-annotation1,2. Here, two examples of an integrated strategy for the discovery of various enzymatic activities catalyzed within protein families will be presented. This approach relies with a high-throughput enzymatic screening on representatives, structural and modeling investigations, analysis of genomic and metabolic context. The structural analysis is in both cases based on the Active Site Clustering Method3 developed at Genoscope. We investigated the protein family with no known function, DUF849 Pfam family, and unearthed 14 potential new enzymatic activities, leading to the designation of these proteins as -keto acid cleavage enzymes4. In addition, we propose an in vivo role for four enzymatic activities and suggest key residues for guiding further functional annotation. The second study will illustrate that proteins with high sequence similarity might not have the same function. We determined the enzymatic activities of 100 O-acyl-L-homoserine transferases representative of the biodiversity of the two unrelated families, MetX and MetA, involved in the first step of the methionine biosynthesis and assumed to always use acetyl-CoA and succinyl-CoA, respectively. We interpreted the results by structural classification of active sites based on protein structure modeling. We identified the specific determining positions responsible for acyl-CoA specificity in the active sites of MetX and MetA enzymes, actually iso-functional for both activities. We then predict that \u3e60% of the 10,000 sequences from these families currently in databases are incorrectly annotated. Finally, we uncovered a divergent subgroup of MetX enzymes in fungi that participate only in L-cysteine biosynthesis as O-succinyl-L-serine transferases5. Our results show that the functional diversity within a family may be largely underestimated. The extension of this strategy to other families will improve our knowledge of the enzymatic landscape and the chemical capabilities of biodiversity. References: 1 de Crecy-Lagard, V. Quality Annotations, a Key Frontier in the Microbial Sciences. Microbe 11, 303-310 (2016). 2 Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5, e1000605 (2009). 3 de Melo-Minardi, R. C., Bastard, K. & Artiguenave, F. Identification of subfamily-specific sites based on active sites modeling and clustering. Bioinformatics 26, 3075-3082, doi:10.1093/bioinformatics/btq595 (2010). 4 Bastard, K. et al. Revealing the hidden functional diversity of an enzyme family. Nature chemical biology 10, 42-49, doi:10.1038/nchembio.1387 (2014). 5 Bastard, K. et al. Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis. Nature chemical biology june (2017)

    Domain-mediated interactions for protein subfamily identification

    Get PDF
    Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu

    A Novel Acyl-CoA Beta-Transaminase Characterized from a Metagenome

    Get PDF
    BACKGROUND: Bacteria are key components in all ecosystems. However, our knowledge of bacterial metabolism is based solely on the study of cultivated organisms which represent just a tiny fraction of microbial diversity. To access new enzymatic reactions and new or alternative pathways, we investigated bacterial metabolism through analyses of uncultivated bacterial consortia. METHODOLOGY/PRINCIPAL FINDINGS: We applied the gene context approach to assembled sequences of the metagenome of the anaerobic digester of a municipal wastewater treatment plant, and identified a new gene which may participate in an alternative pathway of lysine fermentation. CONCLUSIONS: We characterized a novel, unique aminotransferase that acts exclusively on Coenzyme A (CoA) esters, and proposed a variant route for lysine fermentation. Results suggest that most of the lysine fermenting organisms use this new pathway in the digester. Its presence in organisms representative of two distinct bacterial divisions indicate that it may also be present in other organisms

    Automatically exploiting genomic and metabolic contexts to aid the functional annotation of prokaryote genomes

    Get PDF
    Cette thèse porte sur le développement d'approches bioinformatiques exploitant de l'information de contextes génomiques et métaboliques afin de générer des annotations fonctionnelles de gènes prokaryotes, et comporte deux projets principaux. Le premier projet focalise sur les activités enzymatiques orphelines de séquence. Environ 27% des activités définies par le International Union of Biochemistry and Molecular Biology sont encore aujourd'hui orphelines. Pour celles-ci, les méthodes bioinformatiques traditionnelles ne peuvent proposer de gènes candidats; il est donc impératif d'utiliser des méthodes exploitant des informations contextuelles dans ces cas. La stratégie CanOE (fishingCandidate genes for Orphan Enzymes) a été développée et rajoutée à la plateforme MicroScope dans ce but, intégrant des informations génomiques et métaboliques sur des milliers d'organismes prokaryotes afin de localiser des gènes probants pour des activités orphelines. Le projet miroir au précédent est celui des protéines de fonction inconnue. Un projet collaboratif a été initié au Genoscope afin de formaliser les stratégies d'exploration des fonctions de familles protéiques prokaryotes. Une version pilote du projet a été mise en place sur la famille DUF849 dont une fonction enzymatique avait été récemment découverte. Des stratégies de proposition d'activités enzymatiques alternatives et d'établissement de sous familles isofonctionnelles ont été mises en place dans le cadre de cette thèse, afin de guider les expérimentations de paillasse et d'analyser leurs résultats.The subject of this thesis concerns the development of bioinformatic strategies exploiting genomic and metabolic contextual information in order to generate functional annotations for prokaryote genes. Two main projects were involved during this work: the first focuses on sequence-orphan enzymatic activities. Today, roughly 27% of activities defined by International Union of Biochemistry and Molecular Biology are sequence-orphans. For these, traditional bioinformatic approaches cannot propose candidate genes. It is thus imperative to use alternative, context-based approaches in such cases. The CanOE strategy fishing Candidate genes for Orphan Enzymes) was developed and added to the MicroScope bioinformatics platform in this aim. It integrates genomic and metabolic information across thousands of prokaryote genomes in order to locate promising gene candidates for orphan activities. The mirror project focuses on protein families of unknown function. A collaborative project has been set up at the Genoscope in hope of formalising functional exploration strategies for prokaryote protein families. A pilot version was created on the DUF849 Pfam family, for which a single activity had recently been elucidated. Strategies for proposing novel functions and activities and creating isofunctional sub-families were researched, so as to guide biochemical experimentations and to analyse their results.EVRY-Bib. électronique (912289901) / SudocSudocFranceF

    Protocols to capture the functional plasticity of protein domain superfamilies

    Get PDF
    Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function

    Structural modeling and classification of active sites for guiding enzyme functional annotation

    No full text
    The rate of enzyme functional characterization by experiments lags far behind the rate of gene sequence discovery, leading to an accumulation of proteins with no known function. Moreover, in public databases, function is extrapolated from a small number of proteins to all homologous members of a family resulting in 60% of superfamilies being mis-annotated [1]. Our institute has developed an integrated strategy based on in-silico prediction of enzymatic activities and in-vitro screening of enzymes for the discovery of various activities involved in microbial metabolism. As part of this strategy, we developed a structural bioinformatics method, called ASMC, for Active Sites Modeling and Clustering, which classifies proteins of a family into iso-functional sub-families and identifies functional amino acids responsible of specific enzymatic activities [2]. Experiments based on ASMC led to the unearthing of 14 potential new enzymatic activities for a family of unknown function, DUF849, and to the description of 3D-patterns for further annotation of sequences [3]. ASMC was also used to classify two phylogenetically unrelated protein families, MetX and MetA, for which we detected numerous mis-annotations in public databases. We re-examined nearly 10 000 MetA and MetX proteins using homology modeling and corrected the function for about 60% of them [4]. Our results show that the functional diversity within a protein family may be largely underestimated.References:[1] Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5, e1000605 (2009).[2] de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. Bioinformatics. 2010. 26(24):3075-82.[3] Bastard K, Smith AA, Vergne-Vaxelaire C, Perret A, Zaparucha A, De Melo-Minardi R, Mariage A, Boutard M, Debard A, Lechaplais C, Pelle C, Pellouin V, Perchat N, Petit JL, Kreimeyer A, Medigue C, Weissenbach J, Artiguenave F, De Berardinis V, Vallenet D, Salanoubat M. Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol. 2014. 10(1):42-9.[4] Bastard K, Perret A, Mariage A, Bessonnet T, Pinet-Turpault A, Petit JL, Darii E, Bazire P, Vergne-Vaxelaire C, Brewee C, Debard A, Pellouin V, Besnard-Gonnet M, Artiguenave F, Médigue C, Vallenet D, Danchin A, Zaparucha A, Weissenbach J, Salanoubat M, de Berardinis V. Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis. Nat Chem Biol. 2017. 13(8):858-866

    Structural modeling and classification of active sites for guiding enzyme functional annotation

    No full text
    International audienceThe rate of enzyme functional characterization by experiments lags far behind the rate of gene sequence discovery, leading to an accumulation of proteins with no known function. Moreover, in public databases, function is extrapolated from a small number of proteins to all homologous members of a family resulting in 60% of superfamilies being mis-annotated [1]. Our institute has developed an integrated strategy based on in-silico prediction of enzymatic activities and in-vitro screening of enzymes for the discovery of various activities involved in microbial metabolism. As part of this strategy, we developed a structural bioinformatics method, called ASMC, for Active Sites Modeling and Clustering, which classifies proteins of a family into iso-functional sub-families and identifies functional amino acids responsible of specific enzymatic activities [2]. Experiments based on ASMC led to the unearthing of 14 potential new enzymatic activities for a family of unknown function, DUF849, and to the description of 3D-patterns for further annotation of sequences [3]. ASMC was also used to classify two phylogenetically unrelated protein families, MetX and MetA, for which we detected numerous mis-annotations in public databases. We re-examined nearly 10 000 MetA and MetX proteins using homology modeling and corrected the function for about 60% of them [4]. Our results show that the functional diversity within a protein family may be largely underestimated.References:[1] Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5, e1000605 (2009).[2] de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. Bioinformatics. 2010. 26(24):3075-82.[3] Bastard K, Smith AA, Vergne-Vaxelaire C, Perret A, Zaparucha A, De Melo-Minardi R, Mariage A, Boutard M, Debard A, Lechaplais C, Pelle C, Pellouin V, Perchat N, Petit JL, Kreimeyer A, Medigue C, Weissenbach J, Artiguenave F, De Berardinis V, Vallenet D, Salanoubat M. Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol. 2014. 10(1):42-9.[4] Bastard K, Perret A, Mariage A, Bessonnet T, Pinet-Turpault A, Petit JL, Darii E, Bazire P, Vergne-Vaxelaire C, Brewee C, Debard A, Pellouin V, Besnard-Gonnet M, Artiguenave F, Médigue C, Vallenet D, Danchin A, Zaparucha A, Weissenbach J, Salanoubat M, de Berardinis V. Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis. Nat Chem Biol. 2017. 13(8):858-866
    corecore