40 research outputs found

    Integration of Ligand and Structure Based Approaches for CSAR-2014

    No full text
    The prediction of binding poses and affinities is an area of active interest in computer-aided drug design (CADD). Given the documented limitations with either ligand or structure based approaches, we employed an integrated approach and developed a rapid protocol for binding mode and affinity predictions. This workflow was applied to the three protein targets of Community Structure–Activity Resource-2014 (CSAR-2014) exercise: Factor Xa (FXa), Spleen Tyrosine Kinase (SYK), and tRNA (guanine-N(1))-methyltransferase (TrmD). Our docking and scoring workflow incorporates compound clustering and ligand and protein structure based pharmacophore modeling, followed by local docking, minimization, and scoring. While the former part of the protocol ensures high-quality ligand alignments and mapping, the subsequent minimization and scoring provides the predicted binding modes and affinities. We made blind predictions of docking pose for 1, 5, and 14 ligands docked into 1, 2, and 12 crystal structures of FXa, SYK, and TrmD, respectively. The resulting 174 poses were compared with cocrystallized structures (1, 5, and 14 complexes) made available at the end of CSAR. Our predicted poses were related to the experimentally determined structures with a mean root-mean-square deviation value of 3.4 Å. Further, we were able to classify high and low affinity ligands with the area under the curve values of 0.47, 0.60, and 0.69 for FXa, SYK, and TrmD, respectively, indicating the validity of our approach in at least two of the three systems. Detailed critical analysis of the results and CSAR methodology ranking procedures suggested that a straightforward application of our workflow has limitations, as some of the performance measures do not reflect the actual utility of pose and affinity predictions in the biological context of individual systems

    The rf-SDRs for (A) endo-1,4-xylanase (EC 3.2.1.8, CATH domain: 1r87A00) and (B) cellulase (EC 3.2.1.4, CATH domain: 1edgA00) in the glycosidase superfamily (CATH 3.20.20.80).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where nitrogen atoms are colored blue, oxygen atoms are red, sulfur atoms are yellow and carbon atoms are white. The carbon atoms of the active sites selected as rf-SDRs are colored magenta. Eight β-strands in a conventional barrel are colored blue, cyan, green, lemon, yellow, yelloworange, orange, and red, from the N-terminal to the C-terminal. In both enzymes, none of the two catalytic acid residues common in many enzymes in the superfamily, colored magenta, was selected.</p

    The rf-SDRs for (A) quinolinate phosphoribosyltransferase (hQPRTase; EC 2.4.2.19, CATH domain: 1qprF02), (B) α-galactosidase (α-Gal; EC 3.2.1.22, CATH domain: 1uasA01), (C) phosphoribosylformimino-5-aminoimidazole carboxamide ribonucleotide isomerase (HisA) (EC 5.3.1.16, CATH domain: 1qo2A00) and (D) phosphoribosylanthranilate isomerase (TrpF) (EC 5.3.1.24, CATH domain: 1nsjA00) in aldolase class I superfamily (CATH 3.20.20.70).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where nitrogen atoms are colored blue, oxygen atoms are red, sulfur atoms are yellow and carbon atoms are white. The carbon atoms of the active sites selected as rf-SDRs are colored magenta. Eight β-strands in a conventional barrel are colored blue, cyan, green, lemon, yellow, yelloworange, orange, and red, from the N-terminal to the C-terminal. The rf-SDRs in the figures A and B clearly show that the rf-SDRs for hQPRTase include the phosphate binding motif located in β-7 and β-8 in the conventional barrel structure but those for α-Gal are mainly located after β-1 to -5. The figure D shows the residues interacting with different moieties in substrates between HisA and TrpF, Ser 34 and Arg 36.</p

    Amino acid propensities for the rf-SDRs.

    No full text
    <p>The propensity of amino acid <i>i</i> was calculated as the fraction of amino acid <i>i</i> in the rf-SDRs divided by the fraction of amino acid <i>i</i> in all representative enzyme domains.</p

    Outline of dataset construction.

    No full text
    <p>From the UniProtKB/Swiss-Prot database, the enzyme sequences, for which complete EC numbers are assigned, were obtained and their CATH domain regions from the Gene3D database were selected. After adding CATH entries and removal of redundancies, the enzymes having less than ten sequences were removed. The representative structures for each enzyme were selected from the CATH S-level representatives. In the remaining sequences, a predictor was constructed for an enzyme, which has sufficient numbers of positive and negative sequences (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084623#s2" target="_blank">Materials and Methods</a> for more details). Randomly selected 80% of the sequences were used for training. The remaining 20% of the sequences were used as a test dataset.</p

    Outline of the EFPrf system (A) and the predictor for each enzyme constructed by Random Forests (B).

    No full text
    <p>A query to the system is a domain sequence pre-assigned to a CATH homologous superfamily by Gene3D. For each CATH superfamily, binary predictors, each for a known enzyme, process the query and return their results (A). In each predictor, the query is aligned to a representative sequence by the FUGUE software. Based on the alignment, similarity scores for the full-length sequence and at the functional sites are calculated for the input to the predictor (B).</p

    The distribution of active site residues at the end of eight β-strands of enzymes in the superfamilies adopting the TIM barrel fold.

    No full text
    <p>White bars represent the glycosidase superfamily (CATH 3.20.20.80), light gray bars represent the phosphoenolpyruvate-binding domain superfamily (CATH 3.20.20.60), and gray bars represent the aldolase class I superfamily (CATH 3.20.20.70). The percentages were calculated by using 18, three and 29 enzymes for glycosidases, phosphoenolpyruvate-binding domains and aldolase class I, respectively, for which active site information was available.</p

    Distributions of fractions of the rf-SDRs in active site residues (ASRs, A) and ligand binding residues (LBRs, B), observed in the superfamilies with low, medium and high degrees of functional diversity classified at the third-digit level of EC numbers.

    No full text
    <p>The top and bottom of a box indicate 75th and 25th percentiles and the horizontal line in a box represents the median value. The top and bottom whiskers represent 90th and 10th percentiles.</p

    Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests

    Get PDF
    <div><p>Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily.</p></div

    The rf-SDRs for acetylcholine esterase (AChE, EC 3.1.1.7, CATH domain: 1w76B00) in α/β-hydrolase superfamily (CATH 3.40.50.1820).

    No full text
    <p>The rf-SDRs are represented by balls and sticks, where carbon atoms are colored white, nitrogen atoms are blue, oxygen atoms are red and sulfur atoms are yellow. The active site gorge is partially represented by green surface. At the bottom of the active site gorge, the catalytic triads, which are not selected to be the rf-SDRs, are represented by balls and sticks and colored magenta. Many rf-SDRs are positioned around the catalytic gorge region.</p
    corecore