423 research outputs found
Rotamer libraries and probabilities of transition between rotamers for the side chains in protein-protein binding
Author's Pre-print: green tick author can archive pre-print (ie pre-refereeing)
Author's Post-print: grey tick subject to Restrictions below, author can archive post-print (ie final draft post-refereeing)
Restrictions:
12 months embargo
Publisher's Version/PDF: cross author cannot archive publisher's version/PDF
General Conditions:
Some journals have separate policies, please check with each journal directly
On author's personal website, institutional repositories, arXiv, AgEcon, PhilPapers, PubMed Central, RePEc or Social Science Research Network
Author's pre-print may not be updated with Publisher's Version/PDF
Author's pre-print must acknowledge acceptance for publication
Non-Commercial
Publisher's version/PDF cannot be used
Publisher source must be acknowledged with citation
Must link to publisher version with set statement (see policy)
If OnlineOpen is available, BBSRC, EPSRC, MRC, NERC and STFC authors, may self-archive after 12 month
SCREENING INTERACTIONS BETWEEN PROTEINS AND DISORDERED PEPTIDES BY A NOVEL COMPUTATIONAL METHOD
Concerted interactions between proteins in cells form the basis of most biological processes. Biophysicists study protein–protein association by measuring thermodynamic and kinetic properties. Naively, strong binding affinity should be preferred in protein–protein binding to conduct certain biological functions. However, evidence shows that regulatory interactions, such as those between adapter proteins and intrinsically disordered proteins, communicate via low affinity but high complementarity interactions. PDZ domains are one class of adapters that bind linear disordered peptides, which play key roles in signaling pathways. The misregulation of these signals has been implicated in the progression of human cancers. To understand the underlying mechanism of protein-peptide binding interactions and to predict new interactions, in this thesis I have developed: (a) a unique biophysical-derived model to estimate their binding free energy; (b) a novel semi-flexible structure-based method to dock disordered peptides to PDZ domains; (c) predictions of the peptide binding landscape; and, (d) an automated algorithm and web-interface to predict the likelihood that a given linear sequence of amino acids binds to a specific PDZ domain. The docking method, PepDock, takes a peptide sequence and a PDZ protein structure as input, and outputs docked conformations and their corresponding binding affinity estimation, including their optimal free energy pathway. We have applied PepDock to screen several PDZ protein domains. The results not only validated the capabilities of PepDock to accurately discriminate interactions, but also explored the underlying binding mechanism. Specifically, I showed that interactions followed downhill free energy pathways, reconciling a relatively fast association mechanism of intrinsically disordered peptides. The pathways are such that initially the peptide’s C-terminal motif binds non-specifically, forming a weak intermediate, whereas specific binding is achieved only by a subsequent network of contacts (7–9 residues in total). This mechanism allows peptides to quickly probe PDZ domains, rapidly releasing those that do not attain sufficient affinity during binding. Further kinetic analysis indicates that disorder enhanced the specificity of promiscuous interactions between proteins and peptides, while achieving association rates comparable to interactions between ordered proteins
Deciphering the Arginine-Binding Preferences at the Substrate-Binding Groove of Ser/Thr Kinases by Computational Surface Mapping
Protein kinases are key signaling enzymes that catalyze the transfer of γ-phosphate from an ATP molecule to a phospho-accepting residue in the substrate. Unraveling the molecular features that govern the preference of kinases for particular residues flanking the phosphoacceptor is important for understanding kinase specificities toward their substrates and for designing substrate-like peptidic inhibitors. We applied ANCHORSmap, a new fragment-based computational approach for mapping amino acid side chains on protein surfaces, to predict and characterize the preference of kinases toward Arginine binding. We focus on positions P−2 and P−5, commonly occupied by Arginine (Arg) in substrates of basophilic Ser/Thr kinases. The method accurately identified all the P−2/P−5 Arg binding sites previously determined by X-ray crystallography and produced Arg preferences that corresponded to those experimentally found by peptide arrays. The predicted Arg-binding positions and their associated pockets were analyzed in terms of shape, physicochemical properties, amino acid composition, and in-silico mutagenesis, providing structural rationalization for previously unexplained trends in kinase preferences toward Arg moieties. This methodology sheds light on several kinases that were described in the literature as having non-trivial preferences for Arg, and provides some surprising departures from the prevailing views regarding residues that determine kinase specificity toward Arg. In particular, we found that the preference for a P−5 Arg is not necessarily governed by the 170/230 acidic pair, as was previously assumed, but by several different pairs of acidic residues, selected from positions 133, 169, and 230 (PKA numbering). The acidic residue at position 230 serves as a pivotal element in recognizing Arg from both the P−2 and P−5 positions
Computational studies for prediction of protein folding and ligand binding
This dissertation comprises four projects. I) Glycosylation is a post-translational modification that affects many physiological processes, including protein folding, cell interaction and host immune response. PglC, a phosphoglycosyl transferase (PGT) involved in the biosynthesis of N-linked glycoproteins in Campylobacter jejuni, is representative of one of the structurally simplest members of the small bacterial PGT family. The research utilizes sequence similarity network and evolutionary covariance studies to identify the catalytic core of PglC, followed by modeling its three-dimensional structure using the covariance as constraints. II) Rapid growth of fragment-based drug discovery necessitates accurate fragment library screening for targets of interest, finding strong binders with specific binding. While many high-resolution biophysical methods for fragment screening work well, docking-based virtual screening is highly desired due to the speed and cost efficiency. Beyond the key performance-determining factors like score function and search method, the goal is to learn from the experimental fragment bound structures in the PDBbinder database and to evaluate the profile of side-chain flexibility in the interface and its contribution to docking performance. III) Protein docking procedures carry out the task of predicting the structure of a protein–protein complex starting from the known structures of the individual protein components. However, the structure of one or both components frequently must be obtained by homology modeling based on known structures. This work presents a benchmark dataset of experimentally determined target complexes with a large set of sufficiently diverse template complexes identified for each target. The dataset allows developers to test their algorithms combining homology modeling and docking, in order to determine the factors that critically influence the prediction performance. IV) Human Eukaryotic Initiation Factor 4AI (heIF4AI) is the enzymatic component of a highly efficient complex, heIF4F. Its helicase activity binds and unwinds the secondary structure of mRNA at the 5' end and thus plays a crucial role in translation initiation. This research focuses on the C-terminal domain of heIF4AI and investigates its potential as an anti-cancer target by integrating the approaches of solvent mapping, docking, crystallization and NMR
Predicting Transcription Factor Specificity with All-Atom Models
The binding of a transcription factor (TF) to a DNA operator site can
initiate or repress the expression of a gene. Computational prediction of sites
recognized by a TF has traditionally relied upon knowledge of several cognate
sites, rather than an ab initio approach. Here, we examine the possibility of
using structure-based energy calculations that require no knowledge of bound
sites but rather start with the structure of a protein-DNA complex. We study
the PurR E. coli TF, and explore to which extent atomistic models of
protein-DNA complexes can be used to distinguish between cognate and
non-cognate DNA sites. Particular emphasis is placed on systematic evaluation
of this approach by comparing its performance with bioinformatic methods, by
testing it against random decoys and sites of homologous TFs. We also examine a
set of experimental mutations in both DNA and the protein. Using our explicit
estimates of energy, we show that the specificity for PurR is dominated by
direct protein-DNA interactions, and weakly influenced by bending of DNA.Comment: 26 pages, 3 figure
An expanded allosteric network in PTP1B by multitemperature crystallography, fragment screening, and covalent tethering
Abstract: Allostery is an inherent feature of proteins, but it remains challenging to reveal the mechanisms by which allosteric signals propagate. A clearer understanding of this intrinsic circuitry would afford new opportunities to modulate protein function. Here, we have identified allosteric sites in protein tyrosine phosphatase 1B (PTP1B) by combining multiple-temperature X-ray crystallography experiments and structure determination from hundreds of individual small- molecule fragment soaks. New modeling approaches reveal ’hidden’ low-occupancy conformational states for protein and ligands. Our results converge on allosteric sites that are conformationally coupled to the active-site WPD loop and are hotspots for fragment binding. Targeting one of these sites with covalently tethered molecules or mutations allosterically inhibits enzyme activity. Overall, this work demonstrates how the ensemble nature of macromolecular structure, revealed here by multitemperature crystallography, can elucidate allosteric mechanisms and open new doors for long-range control of protein function
Calculation of Proteins' Total Side-Chain Torsional Entropy and Its Influence on Protein-Ligand Interactions
Despite the high density within a typical protein fold, the ensemble of sterically permissible side-chain repackings is vast. Here, we examine the extent of this variability that survives energetic biases due to van der Waals interactions, hydrogen bonding, salt bridges, and solvation. Monte Carlo simulations of an atomistic model exhibit thermal fluctuations among a diverse set of side-chain arrangements, even with the peptide backbone fixed in its crystallographic conformation. We have quantified the torsional entropy of this native-state ensemble, relative to that of a noninteracting reference system, for 12 small proteins. The reduction in entropy per rotatable bond due to each kind of interaction is remarkably consistent across this set of molecules. To assess the biophysical importance of these fluctuations, we have estimated side-chain entropy contributions to the binding affinity of several peptide ligands with calmodulin. Calculations for our fixed-backbone model correlate very well with experimentally determined binding entropies over a range spanning more than 80 kJ/(mol·308 K)
Rationalization and Design of the Complementarity Determining Region Sequences in an Antibody-Antigen Recognition Interface
Protein-protein interactions are critical determinants in biological systems. Engineered proteins binding to specific areas on protein surfaces could lead to therapeutics or diagnostics for treating diseases in humans. But designing epitope-specific protein-protein interactions with computational atomistic interaction free energy remains a difficult challenge. Here we show that, with the antibody-VEGF (vascular endothelial growth factor) interaction as a model system, the experimentally observed amino acid preferences in the antibody-antigen interface can be rationalized with 3-dimensional distributions of interacting atoms derived from the database of protein structures. Machine learning models established on the rationalization can be generalized to design amino acid preferences in antibody-antigen interfaces, for which the experimental validations are tractable with current high throughput synthetic antibody display technologies. Leave-one-out cross validation on the benchmark system yielded the accuracy, precision, recall (sensitivity) and specificity of the overall binary predictions to be 0.69, 0.45, 0.63, and 0.71 respectively, and the overall Matthews correlation coefficient of the 20 amino acid types in the 24 interface CDR positions was 0.312. The structure-based computational antibody design methodology was further tested with other antibodies binding to VEGF. The results indicate that the methodology could provide alternatives to the current antibody technologies based on animal immune systems in engineering therapeutic and diagnostic antibodies against predetermined antigen epitopes
New Algorithms for Predicting Conformational Polymorphism and Inferring Direct Couplings for Side Chains of Proteins
Protein crystals populate diverse conformational ensembles. Despite much evidence that
there is widespread conformational polymorphism in protein side chains, most of the xray
crystallography data are modelled by single conformations in the Protein Data Bank.
The ability to extract or to predict these conformational polymorphisms is of crucial importance,
as it facilitates deeper understanding of protein dynamics and functionality.
This dissertation describes a computational strategy capable of predicting side-chain polymorphisms.
The applied approach extends a particular class of algorithms for side-chain
prediction by modelling the side-chain dihedral angles more appropriately as continuous
rather than discrete variables. Employing a new inferential technique known as particle
belief propagation (PBP), we predict residue-speci c distributions that encode information
about side-chain polymorphisms. The predicted polymorphisms are in relatively close
agreement with results from a state-of-the-art approach based on x-ray crystallography
data. This approach characterizes the conformational polymorphisms of side chains using
electron density information, and has successfully discovered previously unmodelled
conformations.
Furthermore, it is known that coupled
uctuations and concerted motions of residues
can reveal pathways of communication used for information propagation in a molecule
and hence, can help in understanding the \allostery" phenomenon in proteins. In order
to characterize the coupled motions, most existing methods infer structural dependencies
among a protein's residues. However, recent studies have highlighted the role of coupled
side-chain
uctuations alone in the allosteric behaviour of proteins, in contrast to a
common belief that the backbone motions play the main role in allostery. These studies
and the aforementioned recent discoveries about prevalent alternate side-chain conformations
(conformational polymorphism) accentuate the need to devise new computational
approaches that acknowledge side chains' roles. As well, these approaches must consider
the polymorphic nature of the side chains, and incorporate e ects of this phenomenon
(polymorphism) in the study of information transmission and functional interactions of
residues in a molecule. Such frameworks can provide a more accurate understanding of the
allosteric behaviour.
Hence, as a topic related to the conformational polymorphism, this dissertation addresses
the problem of inferring directly coupled side chains, as well. First, we present a
novel approach to generate an ensemble of conformations and an e cient computational
method to extract direct couplings of side chains in allosteric proteins. These direct couplings
are used to provide sparse network representations of the coupled side chains. The
framework is based on a fairly new statistical method, named graphical lasso (GLASSO),
iii
devised for sparse graph estimation. In the proposed GLASSO-based framework, the sidechain
conformational polymorphism is taken into account. It is shown that by studying
the intrinsic dynamics of an inactive structure alone, we are able to construct a network of
functionally crucial residues. Second, we show that the proposed method is capable of providing
a magni ed view of the coupled and conformationally polymorphic side chains. This
model reveals couplings between the alternate conformations of a coupled residue pair. To
the best of our knowledge, this is the rst computational method for extracting networks
of side chains' alternate conformations. Such networks help in providing a detailed image
of side-chain dynamics in functionally important and conformationally polymorphic sites,
such as binding and/or allosteric sites. This information may assist in new drug-design
alternatives.
Side-chain conformations are commonly represented by multivariate angular variables.
However, the GLASSO and other existing methods that can be applied to the aforementioned
inference task are not capable of handling multivariate angular data. This dissertation
further proposes a novel method to infer direct couplings from this type of data, and
shows that this method is useful for identifying functional regions and their interactions in
allosteric proteins. The proposed framework is a novel extension of canonical correlation
analysis (CCA), which we call \kernelized partial CCA" (or simply KPCCA). Using the
conformational information and
uctuations of the inactive structure alone for allosteric
proteins in the Ras and other Ras-like families, the KPCCA method identi ed allosterically
important residues not only as strongly coupled ones but also in densely connected
regions of the interaction graph formed by the inferred couplings. The results were in good
agreement with other empirical ndings and outperformed those obtained by the GLASSO-based framework. By studying distinct members of the Ras, Rho, and Rab sub-families,
we show further that KPCCA is capable of inferring common allosteric characteristics in
the small G protein super-family
- …