45 research outputs found

    Exploring the Composition of Protein-Ligand Binding Sites on a Large Scale

    Get PDF
    <div><p>The residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant “valid” ligands from “invalid” small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.</p></div

    Propensities of SC interactions in valid sites, with and without the top-20 ligands by frequency.

    No full text
    <p>A) Propensities in valid sites. B) Propensities in invalid sites. The error bars represent 95<sup>th</sup> percentile bounds based on leave-10%-out clustering within each set. Residues are ordered alphabetically.</p

    Moving Beyond Active-Site Detection: MixMD Applied to Allosteric Systems

    No full text
    Mixed-solvent molecular dynamics (MixMD) is a hotspot-mapping technique that relies on molecular dynamics simulations of proteins in binary solvent mixtures. Previous work on MixMD has established the technique’s effectiveness in capturing binding sites of small organic compounds. In this work, we show that MixMD can identify both competitive and allosteric sites on proteins. The MixMD approach embraces full protein flexibility and allows competition between solvent probes and water. Sites preferentially mapped by probe molecules are more likely to be binding hotspots. There are two important requirements for the identification of ligand-binding hotspots: (1) hotspots must be mapped at very high signal-to-noise ratio and (2) the hotspots must be mapped by multiple probe types. We have developed our mapping protocol around acetonitrile, isopropanol, and pyrimidine as probe solvents because they allowed us to capture hydrophilic, hydrophobic, hydrogen-bonding, and aromatic interactions. Charged probes were needed for mapping one target, and we introduce them in this work. In order to demonstrate the robust nature and wide applicability of the technique, a combined total of 5 μs of MixMD was applied across several protein targets known to exhibit allosteric modulation. Most notably, all the protein crystal structures used to initiate our simulations had no allosteric ligands bound, so there was no preorganization of the sites to predispose the simulations to find the allosteric hotspots. The protein test cases were ABL Kinase, Androgen Receptor, CHK1 Kinase, Glucokinase, PDK1 Kinase, Farnesyl Pyrophosphate Synthase, and Protein-Tyrosine Phosphatase 1B. The success of the technique is demonstrated by the fact that the top-four sites solely map the competitive and allosteric sites. Lower-ranked sites consistently map other biologically relevant sites, multimerization interfaces, or crystal-packing interfaces. Lastly, we highlight the importance of including protein flexibility by demonstrating that MixMD can map allosteric sites that are not detected in half the systems using FTMap applied to the same crystal structures

    Relative frequency of SC-only, BB-only or both (SC+BB) interactions per residue.

    No full text
    <p>The residues with “SC” interactions in our analysis combine the SC-only and “SC+BB” contacts (blue+yellow). Residues are ordered by increasing BB-only frequency. Here, all Gly interactions are shown as BB-only to show its overall contribution to BB-only contacts. Due to rounding, columns may occasionally sum to a value other than 100%.</p

    Frequencies of solvent-accessible SC with a cutoff of SASA ≥5 Å<sup>2</sup> and SASA ≥0.5 Å<sup>2</sup>.

    No full text
    <p>Residues are sorted by decreasing hydrophobicity. With the smaller cutoff, the pattern shifts to more hydrophobic residues because poorly exposed, interior residues are able to meet the criteria with only a small patch of exposed surface.</p

    Frequencies of BB-only contacts in binding sites, sorted by increasing frequency on the protein surface.

    No full text
    <p>Surface residues with 5 Ă…<sup>2</sup> or greater backbone SASA are shown. Gly interactions are shown as BB-only to stress that it constitutes the vast majority of such contacts. Due to rounding, rows may occasionally sum to a value other than 100%.</p

    Propensities in valid binding sites.

    No full text
    <p>Propensities are broken down into A) enzyme and B) non-enzyme proteins. The black error bars represent 95<sup>th</sup> percentile bounds based on leave-10%-out clustering. For context, red lines represent 95<sup>th</sup> percentile bounds of propensities from 10,000 random samples of A) 2500 random, diverse proteins and B) 1000 random, diverse proteins (as seen in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003321#pcbi-1003321-t004" target="_blank">Table 4</a>). Stars indicate residues whose median propensity value (leave-10%-out 95<sup>th</sup> percentile error) falls outside of the 95<sup>th</sup> percentiles of the randomly-sampled propensities.</p

    Frequencies and propensities of surface residues.

    No full text
    <p>A) Frequencies of solvent-accessible side chains on the protein surface and in binding sites with SASA cutoff ≥5 Å<sup>2</sup>. Due to rounding, rows in A) may occasionally sum to a value other than 100%. B) Median propensity of residues in ligand binding sites of valid and invalid ligands, analyzed across all proteins. Residues in A and B are ordered by increasing frequency on surface. C) Ratio of residue propensity for valid versus invalid binding sites. Residues ordered by decreasing ratio. Error bars in B and C indicate 95<sup>th</sup> percentiles of 10,000 leave-10%-out samples.</p

    Comparison of the average number of hydrogen-bonding contacts to surface residues.

    No full text
    <p>Hydrogen bonding of all valid and invalid ligands are compared across all residues that meet the surface definition. Both backbone and side-chain atoms are listed. The values and differences are given in both hydrogen bonds per residue and contacts per hydrogen-bonding atom. Due to rounding, columns may occasionally sum to a value other than 100%.</p>a<p>Sum of all hydrogen bonds per number of hydrogen-bonding atoms.</p

    Comparison of “raw” ligand contacts to “surface” ligand contacts.

    No full text
    <p>Average contacts for valid and invalid ligands are compared across all residue types. The values and differences are given in both contacts/amino acid and contacts per non-hydrogen atom. The maximum and minimum values in each column are noted with bold; values for invalid ligands are noted in italics. Due to rounding, columns may occasionally sum to a value other than 100%.</p>a<p>Number of non-Hydrogen atoms in each residue.</p
    corecore