9 research outputs found

    The rf-SDRs for (A) quinolinate phosphoribosyltransferase (hQPRTase; EC 2.4.2.19, CATH domain: 1qprF02), (B) α-galactosidase (α-Gal; EC 3.2.1.22, CATH domain: 1uasA01), (C) phosphoribosylformimino-5-aminoimidazole carboxamide ribonucleotide isomerase (HisA) (EC 5.3.1.16, CATH domain: 1qo2A00) and (D) phosphoribosylanthranilate isomerase (TrpF) (EC 5.3.1.24, CATH domain: 1nsjA00) in aldolase class I superfamily (CATH 3.20.20.70).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where nitrogen atoms are colored blue, oxygen atoms are red, sulfur atoms are yellow and carbon atoms are white. The carbon atoms of the active sites selected as rf-SDRs are colored magenta. Eight β-strands in a conventional barrel are colored blue, cyan, green, lemon, yellow, yelloworange, orange, and red, from the N-terminal to the C-terminal. The rf-SDRs in the figures A and B clearly show that the rf-SDRs for hQPRTase include the phosphate binding motif located in β-7 and β-8 in the conventional barrel structure but those for α-Gal are mainly located after β-1 to -5. The figure D shows the residues interacting with different moieties in substrates between HisA and TrpF, Ser 34 and Arg 36.</p

    Amino acid propensities for the rf-SDRs.

    No full text
    <p>The propensity of amino acid <i>i</i> was calculated as the fraction of amino acid <i>i</i> in the rf-SDRs divided by the fraction of amino acid <i>i</i> in all representative enzyme domains.</p

    Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests

    Get PDF
    <div><p>Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily.</p></div

    The rf-SDRs for (A) endo-1,4-xylanase (EC 3.2.1.8, CATH domain: 1r87A00) and (B) cellulase (EC 3.2.1.4, CATH domain: 1edgA00) in the glycosidase superfamily (CATH 3.20.20.80).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where nitrogen atoms are colored blue, oxygen atoms are red, sulfur atoms are yellow and carbon atoms are white. The carbon atoms of the active sites selected as rf-SDRs are colored magenta. Eight β-strands in a conventional barrel are colored blue, cyan, green, lemon, yellow, yelloworange, orange, and red, from the N-terminal to the C-terminal. In both enzymes, none of the two catalytic acid residues common in many enzymes in the superfamily, colored magenta, was selected.</p

    Outline of dataset construction.

    No full text
    <p>From the UniProtKB/Swiss-Prot database, the enzyme sequences, for which complete EC numbers are assigned, were obtained and their CATH domain regions from the Gene3D database were selected. After adding CATH entries and removal of redundancies, the enzymes having less than ten sequences were removed. The representative structures for each enzyme were selected from the CATH S-level representatives. In the remaining sequences, a predictor was constructed for an enzyme, which has sufficient numbers of positive and negative sequences (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084623#s2" target="_blank">Materials and Methods</a> for more details). Randomly selected 80% of the sequences were used for training. The remaining 20% of the sequences were used as a test dataset.</p

    Outline of the EFPrf system (A) and the predictor for each enzyme constructed by Random Forests (B).

    No full text
    <p>A query to the system is a domain sequence pre-assigned to a CATH homologous superfamily by Gene3D. For each CATH superfamily, binary predictors, each for a known enzyme, process the query and return their results (A). In each predictor, the query is aligned to a representative sequence by the FUGUE software. Based on the alignment, similarity scores for the full-length sequence and at the functional sites are calculated for the input to the predictor (B).</p

    The distribution of active site residues at the end of eight β-strands of enzymes in the superfamilies adopting the TIM barrel fold.

    No full text
    <p>White bars represent the glycosidase superfamily (CATH 3.20.20.80), light gray bars represent the phosphoenolpyruvate-binding domain superfamily (CATH 3.20.20.60), and gray bars represent the aldolase class I superfamily (CATH 3.20.20.70). The percentages were calculated by using 18, three and 29 enzymes for glycosidases, phosphoenolpyruvate-binding domains and aldolase class I, respectively, for which active site information was available.</p

    Distributions of fractions of the rf-SDRs in active site residues (ASRs, A) and ligand binding residues (LBRs, B), observed in the superfamilies with low, medium and high degrees of functional diversity classified at the third-digit level of EC numbers.

    No full text
    <p>The top and bottom of a box indicate 75th and 25th percentiles and the horizontal line in a box represents the median value. The top and bottom whiskers represent 90th and 10th percentiles.</p

    The rf-SDRs for acetylcholine esterase (AChE, EC 3.1.1.7, CATH domain: 1w76B00) in α/β-hydrolase superfamily (CATH 3.40.50.1820).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where carbon atoms are colored white, nitrogen atoms are blue, oxygen atoms are red and sulfur atoms are yellow. The active site gorge is partially represented by green surface. At the bottom of the active site gorge, the catalytic triads, which are not selected to be the rf-SDRs, are represented by balls and sticks and colored magenta. Many rf-SDRs are positioned around the catalytic gorge region.</p
    corecore