36 research outputs found
Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics (NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date, 20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function relationships
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.
Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole genome mutation screening in Candida albicans and aeruginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.
Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p
Structural Diversity within the Mononuclear and Binuclear Active Sites of N-Acetyl-D-glucosamine-6-phosphate Deacetylase †, ‡
ABSTRACT: NagA catalyzes the hydrolysis of N-acetyl-D-glucosamine-6-phosphate to D-glucosamine-6-phosphate and acetate. X-ray crystal structures of NagA from Escherichia coli were determined to establish the number and ligation scheme for the binding of zinc to the active site and to elucidate the molecular interactions between the protein and substrate. The three-dimensional structures of the apo-NagA, ZnNagA, and the D273N mutant enzyme in the presence of a tight-binding N-methylhydroxyphosphinyl-D-glucosamine-6-phosphate inhibitor were determined. The structure of the Zn-NagA confirms that this enzyme binds a single divalent cation at the beta-position in the active site via ligation to . A water molecule completes the ligation shell, which is also in position to be hydrogen bonded to Asp-273. In the structure of NagA bound to the tight binding inhibitor that mimics the tetrahedral intermediate, the methyl phosphonate moiety has displaced the hydrolytic water molecule and is directly coordinated to the zinc within the active site. The side chain of Asp-273 is positioned to activate the hydrolytic water molecule via general base catalysis and to deliver this proton to the amino group upon cleavage of the amide bond of the substrate. His-143 is positioned to help polarize the carbonyl group of the substrate in conjunction with Lewis acid catalysis by the bound zinc. The inhibitor is bound in the R-configuration at the anomeric carbon through a hydrogen bonding interaction of the hydroxyl group at C-1 with the side chain of His-251. The phosphate group of the inhibitor attached to the hydroxyl at C-6 is ion paired with Arg-227 from the adjacent subunit. NagA from Thermotoga maritima was shown to require a single divalent cation for full catalytic activity. NagA 1 (E.C. 3.5.1.25) is a metal-dependent enzyme which catalyzes the deacetylation of N-acetyl-D-glucosamine-6-phosphate to form acetate and D-glucosamine-6-phosphate as presented in Scheme 1. NagA thus catalyzes a key step in the catabolism of N-acetyl-D-glucosamine from chitobiose and the recycling of cell wall murein (1-4). Over 300 sequences homologous to that of NagA from Escherichia coli K-12 have been identified in the current NCBI databases. Nearly all of these sequences are annotated as NagA, but some of them are annotated as N-acetyl-D-galactosamine-6-phosphate deacetylase (AgaA). In the genome of E. coli K-12 there is a deletion which eliminates the genes encoding for the N-acetyl-D-galactosamine (Aga) and D-galactosamine (GalN) phosphotransferase systems, while truncating the gene for AgaA. These deletions prevent the growth of E. coli K-12 on either N-acetyl-D-galactosamine or D-galactosamine (5). NagA has been characterized as a member of the amidohydrolase superfamily (AHS) based on sequence and structural similarities to other enzymes within this superfamily (6). All proteins of the amidohydrolase superfamily possess a ( /R) 8 -barrel structural fold (7). The enzymes in this superfamily have been shown to contain R -binuclear metal centers (8-13), R-mononuclear metal centers (14-16), -mononuclear metal centers (6, 13), or metalindependent active sites (17). X-ray crystal structures have been reported for NagA from Bacillus subtilis (PDB code: 1un7 (11)), Thermotoga mar-
Prediction of function for the polyprenyl transferase subgroup in the isoprenoid synthase superfamily
The number of available protein sequences has increased exponentially with the advent of high-throughput genomic sequencing, creating a significant challenge for functional annotation. Here, we describe a large-scale study on assigning function to unknown members of the trans-polyprenyl transferase (E-PTS) subgroup in the isoprenoid synthase superfamily, which provides substrates for the biosynthesis of the more than 55,000 isoprenoid metabolites. Although the mechanism for determining the product chain length for these enzymes is known, there is no simple relationship between function and primary sequence, so that assigning function is challenging. We addressed this challenge through large-scale bioinformatics analysis of >5,000 putative polyprenyl transferases; experimental characterization of the chain-length specificity of 79 diverse members of this group; determination of 27 structures of 19 of these enzymes, including seven cocrystallized with substrate analogs or products; and the development and successful application of a computational approach to predict function that leverages available structural data through homology modeling and docking of possible products into the active site. The crystallographic structures and computational structural models of the enzyme-ligand complexes elucidate the structural basis of specificity. As a result of this study, the percentage of E-PTS sequences similar to functionally annotated ones (BLAST e-value ≤ 1e(-70)) increased from 40.6 to 68.8%, and the percentage of sequences similar to available crystal structures increased from 28.9 to 47.4%. The high accuracy of our blind prediction of newly characterized enzymes indicates the potential to predict function to the complete polyprenyl transferase subgroup of the isoprenoid synthase superfamily computationally
A hypothetical example output of the carbocation docking.
<p>A hypothetical example output of the carbocation docking.</p
Illustration of the key dihedral angle C16-C17-C18-H18 that determines the conversion of I1 to I2: a) A-I1; b) B-I1.
<p>Illustration of the key dihedral angle C16-C17-C18-H18 that determines the conversion of I1 to I2: a) A-I1; b) B-I1.</p
Docking score (MM/GBSA) of 9 carbocationic intermediates for 22 triterpenoid synthase homology models that follow channel C.
<p>Compounds that could not be successfully docked at all are arbitrarily assigned a docking score of −10 kcal/mol. Figure legend shows the UniProtKB IDs for the triterpenoid synthases. Panel a shows the docking scores against 8 lanosterol synthases (in red); panel b shows the docking scores against 10 cycloartenol synthases (in lime green); and panel c shows the docking scores against a cucurbitadienol synthase (in cyan), a parkeol synthase (in magenta) and 2 protostadienol synthases (in blue). Details c.f. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003874#pcbi.1003874.s006" target="_blank">Table S2</a>.</p