72,005 research outputs found
Protein–DNA binding specificity predictions with structural models
Protein–DNA interactions play a central role in transcriptional regulation and other biological processes. Investigating the mechanism of binding affinity and specificity in protein–DNA complexes is thus an important goal. Here we develop a simple physical energy function, which uses electrostatics, solvation, hydrogen bonds and atom-packing terms to model direct readout and sequence-specific DNA conformational energy to model indirect readout of DNA sequence by the bound protein. The predictive capability of the model is tested against another model based only on the knowledge of the consensus sequence and the number of contacts between amino acids and DNA bases. Both models are used to carry out predictions of protein–DNA binding affinities which are then compared with experimental measurements. The nearly additive nature of protein–DNA interaction energies in our model allows us to construct position-specific weight matrices by computing base pair probabilities independently for each position in the binding site. Our approach is less data intensive than knowledge-based models of protein–DNA interactions, and is not limited to any specific family of transcription factors. However, native structures of protein–DNA complexes or their close homologs are required as input to the model. Use of homology modeling can significantly increase the extent of our approach, making it a useful tool for studying regulatory pathways in many organisms and cell types
Inherent limitations of probabilistic models for protein-DNA binding specificity
The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible
From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions
©2009 Gao, Skolnick. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.doi:10.1371/journal.pcbi.1000341DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Ca deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein
Recommended from our members
A combined computational-experimental approach to define the structural origin of antibody recognition of sialyl-Tn, a tumor-associated carbohydrate antigen.
Anti-carbohydrate monoclonal antibodies (mAbs) hold great promise as cancer therapeutics and diagnostics. However, their specificity can be mixed, and detailed characterization is problematic, because antibody-glycan complexes are challenging to crystallize. Here, we developed a generalizable approach employing high-throughput techniques for characterizing the structure and specificity of such mAbs, and applied it to the mAb TKH2 developed against the tumor-associated carbohydrate antigen sialyl-Tn (STn). The mAb specificity was defined by apparent KD values determined by quantitative glycan microarray screening. Key residues in the antibody combining site were identified by site-directed mutagenesis, and the glycan-antigen contact surface was defined using saturation transfer difference NMR (STD-NMR). These features were then employed as metrics for selecting the optimal 3D-model of the antibody-glycan complex, out of thousands plausible options generated by automated docking and molecular dynamics simulation. STn-specificity was further validated by computationally screening of the selected antibody 3D-model against the human sialyl-Tn-glycome. This computational-experimental approach would allow rational design of potent antibodies targeting carbohydrates
Predicting Transcription Factor Specificity with All-Atom Models
The binding of a transcription factor (TF) to a DNA operator site can
initiate or repress the expression of a gene. Computational prediction of sites
recognized by a TF has traditionally relied upon knowledge of several cognate
sites, rather than an ab initio approach. Here, we examine the possibility of
using structure-based energy calculations that require no knowledge of bound
sites but rather start with the structure of a protein-DNA complex. We study
the PurR E. coli TF, and explore to which extent atomistic models of
protein-DNA complexes can be used to distinguish between cognate and
non-cognate DNA sites. Particular emphasis is placed on systematic evaluation
of this approach by comparing its performance with bioinformatic methods, by
testing it against random decoys and sites of homologous TFs. We also examine a
set of experimental mutations in both DNA and the protein. Using our explicit
estimates of energy, we show that the specificity for PurR is dominated by
direct protein-DNA interactions, and weakly influenced by bending of DNA.Comment: 26 pages, 3 figure
RosettaBackrub--a web server for flexible backbone protein structure modeling and design.
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein-protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design
Functional interplay between NTP leaving group and base pair recognition during RNA polymerase II nucleotide incorporation revealed by methylene substitution.
RNA polymerase II (pol II) utilizes a complex interaction network to select and incorporate correct nucleoside triphosphate (NTP) substrates with high efficiency and fidelity. Our previous 'synthetic nucleic acid substitution' strategy has been successfully applied in dissecting the function of nucleic acid moieties in pol II transcription. However, how the triphosphate moiety of substrate influences the rate of P-O bond cleavage and formation during nucleotide incorporation is still unclear. Here, by employing β,γ-bridging atom-'substituted' NTPs, we elucidate how the methylene substitution in the pyrophosphate leaving group affects cognate and non-cognate nucleotide incorporation. Intriguingly, the effect of the β,γ-methylene substitution on the non-cognate UTP/dT scaffold (∼3-fold decrease in kpol) is significantly different from that of the cognate ATP/dT scaffold (∼130-fold decrease in kpol). Removal of the wobble hydrogen bonds in U:dT recovers a strong response to methylene substitution of UTP. Our kinetic and modeling studies are consistent with a unique altered transition state for bond formation and cleavage for UTP/dT incorporation compared with ATP/dT incorporation. Collectively, our data reveals the functional interplay between NTP triphosphate moiety and base pair hydrogen bonding recognition during nucleotide incorporation
Characterization of Aptamer-Protein Complexes by X-ray Crystallography and Alternative Approaches
Aptamers are oligonucleotide ligands, either RNA or ssDNA, selected for high-affinity binding to molecular targets, such as small organic molecules, proteins or whole microorganisms. While reports of new aptamers are numerous, characterization of their specific interaction is often restricted to the affinity of binding (KD). Over the years, crystal structures of aptamer-protein complexes have only scarcely become available. Here we describe some relevant technical issues about the process of crystallizing aptamer-protein complexes and highlight some biochemical details on the molecular basis of selected aptamer-protein interactions. In addition, alternative experimental and computational approaches are discussed to study aptamer-protein interactions.
- …