1,107 research outputs found

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    DNA nano-mechanics: how proteins deform the double helix

    Full text link
    It is a standard exercise in mechanical engineering to infer the external forces and torques on a body from its static shape and known elastic properties. Here we apply this kind of analysis to distorted double-helical DNA in complexes with proteins. We extract the local mean forces and torques acting on each base-pair of bound DNA from high-resolution complex structures. Our method relies on known elastic potentials and a careful choice of coordinates of the well-established rigid base-pair model of DNA. The results are robust with respect to parameter and conformation uncertainty. They reveal the complex nano-mechanical patterns of interaction between proteins and DNA. Being non-trivially and non-locally related to observed DNA conformations, base-pair forces and torques provide a new view on DNA-protein binding that complements structural analysis.Comment: accepted for publication in JCP; some minor changes in response to review 18 pages, 5 figure + supplement: 4 pages, 3 figure

    PhysBinder : improving the prediction of transcription factor binding sites by flexible inclusion of biophysical properties

    Get PDF
    The most important mechanism in the regulation of transcription is the binding of a transcription factor (TF) to a DNA sequence called the TF binding site (TFBS). Most binding sites are short and degenerate, which makes predictions based on their primary sequence alone somewhat unreliable. We present a new web tool that implements a flexible and extensible algorithm for predicting TFBS. The algorithm makes use of both direct (the sequence) and several indirect readout features of protein-DNA complexes (biophysical properties such as bendability or the solvent-excluded surface of the DNA). This algorithm significantly outperforms state-of-the-art approaches for in silico identification of TFBS. Users can submit FASTA sequences for analysis in the PhysBinder integrative algorithm and choose from >60 different TF-binding models. The results of this analysis can be used to plan and steer wet-lab experiments. The PhysBinder web tool is freely available at http://bioit.dmbr.ugent.be/physbinder/index.php

    Anomalous DNA binding by E2 regulatory protein driven by spacer sequence TATA

    Get PDF
    We have investigated the anomalously weak binding of human papillomavirus (HPV) regulatory protein E2 to a DNA target containing the spacer sequence TATA. Experiments in magnesium (Mg2+) and calcium (Ca2+) ion buffers revealed a marked reduction in cutting by DNase I at the CpG sequence in the protein-binding site 3′ to the TATA spacer sequence, Studies of the cation dependence of DNA-E2 affinities showed that upon E2 binding the TATA sequence releases approximately twice as many Mg2+ ions as the average of the other spacer sequences. Binding experiments for TATA spacer relative to ATAT showed that in potassium ion (K+) the E2 affinity of the two sequences is nearly equal, but the relative dissociation constant (Kd) for TATA increases in the order K+ < Na+ < Ca2+ < Mg2+. Except for Mg2+, Kd for TATA relative to ATAT is independent of ion concentration, whereas for Mg2+ the affinity for TATA drops sharply as ion concentration increases. Thus, ions of increasing positive charge density increasingly distort the E2 binding site, weakening the affinity for protein. In the case of Mg2+, additional ions are bound to TATA that require displacement for protein binding. We suggest that the TATA sequence may bias the DNA structure towards a conformation that binds the protein relatively weakly

    Predicting the effects of basepair mutations in DNA-protein complexes by thermodynamic integration

    Get PDF
    AbstractThermodynamically rigorous free energy methods in principle allow the exact computation of binding free energies in biological systems. Here, we use thermodynamic integration together with molecular dynamics simulations of a DNA-protein complex to compute relative binding free energies of a series of mutants of a protein-binding DNA operator sequence. A guanine-cytosine basepair that interacts strongly with the DNA-binding protein is mutated into adenine-thymine, cytosine-guanine, and thymine-adenine. It is shown that basepair mutations can be performed using a conservative protocol that gives error estimates of ∼10% of the change in free energy of binding. Despite the high CPU-time requirements, this work opens the exciting opportunity of being able to perform basepair scans to investigate protein-DNA binding specificity in great detail computationally

    Free energy contributions to direct readout of a DNA sequence

    Get PDF
    The energetic contributions of individual DNA-contacting side chains to specific DNA recognition in the human papillomavirus 16 E2C-DNA complex is small (less than 1.0 kcal mol-1), independent of the physical and chemical nature of the interaction, and is strictly additive. The sum of the individual contributions differs 1.0 kcal mol-1 from the binding energy of the wild-type protein. This difference corresponds to the contribution from the deformability of the DNA, known as "indirect readout." Thus, we can dissect the energetic contribution to DNA binding into 90% direct and 10% indirect readout components. The lack of high energy interactions indicates the absence of "hot spots," such as those found in protein-protein interfaces. These results are compatible with a highly dynamic and "wet" protein-DNA interface, yet highly specific and tight, where individual interactions are constantly being formed and broken.Fil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Dellarole, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Nadra, Alejandro Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: de Prat Gay, Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentin

    Structural and mutagenic analysis of the RM controller protein C.Esp1396I

    Get PDF
    Bacterial restriction-modification (RM) systems are comprised of two complementary enzymatic activities that prevent the establishment of foreign DNA in a bacterial cell: DNA methylation and DNA restriction. These two activities are tightly regulated to prevent over-methylation or auto-restriction. Many Type II RM systems employ a controller (C) protein as a transcriptional regulator for the endonuclease gene (and in some cases, the methyltransferase gene also). All high-resolution structures of C-protein/DNA-protein complexes solved to date relate to C.Esp1396I, from which the interactions of specific amino acid residues with DNA bases and/or the phosphate backbone could be observed. Here we present both structural and DNA binding data for a series of mutations to the key DNA binding residues of C.Esp1396I. Our results indicate that mutations to the backbone binding residues (Y37, S52) had a lesser affect on DNA binding affinity than mutations to those residues that bind directly to the bases (T36, R46), and the contributions of each side chain to the binding energies are compared. High-resolution X-ray crystal structures of the mutant and native proteins showed that the fold of the proteins was unaffected by the mutations, but also revealed variation in the flexible loop conformations associated with DNA sequence recognition. Since the tyrosine residue Y37 contributes to DNA bending in the native complex, we have solved the structure of the Y37F mutant protein/DNA complex by X-ray crystallography to allow us to directly compare the structure of the DNA in the mutant and native complexes

    Tuning the relative affinities for activating and repressing operators of a temporally regulated restriction-modification system

    Get PDF
    Most type II restriction-modification (R-M) systems produce separate endonuclease (REase) and methyltransferase (MTase) proteins. After R-M genes enter a new cell, MTase activity must appear before REase or the host chromosome will be cleaved. Temporal control of these genes thus has life-or-death consequences. PvuII and some other R-M systems delay endonuclease expression by cotranscribing the REase gene with the upstream gene for an autogenous activator/repressor (C protein). C.PvuII was previously shown to have low levels early, but positive feedback later boosts transcription of the C and REase genes. The MTase is expressed without delay, and protects the host DNA. C.PvuII binds to two sites upstream of its gene: OL, associated with activation, and OR, associated with repression. Even when symmetry elements of each operator are made identical, C.PvuII binds preferentially to OL. In this study, the intra-operator spacers are shown to modulate relative C.PvuII affinity. In light of a recently reported C.Esp1396I-DNA co-crystal structure, in vitro and in vivo effects of altering OL and OR spacers were determined. The results suggest that the GACTnnnAGTC consensus is the primary determinant of C.PvuII binding affinity, with intra-operator spacers playing a fine-tuning role that affects mobility of this R-M system
    corecore