288,928 research outputs found

    Evidence for Non-Random Hydrophobicity Structures in Protein Chains

    Full text link
    The question of whether proteins originate from random sequences of amino acids is addressed. A statistical analysis is performed in terms of blocked and random walk values formed by binary hydrophobic assignments of the amino acids along the protein chains. Theoretical expectations of these variables from random distributions of hydrophobicities are compared with those obtained from functional proteins. The results, which are based upon proteins in the SWISS-PROT data base, convincingly show that the amino acid sequences in proteins differ from what is expected from random sequences in a statistical significant way. By performing Fourier transforms on the random walks one obtains additional evidence for non-randomness of the distributions. We have also analyzed results from a synthetic model containing only two amino-acid types, hydrophobic and hydrophilic. With reasonable criteria on good folding properties in terms of thermodynamical and kinetic behavior, sequences that fold well are isolated. Performing the same statistical analysis on the sequences that fold well indicates similar deviations from randomness as for the functional proteins. The deviations from randomness can be interpreted as originating from anticorrelations in terms of an Ising spin model for the hydrophobicities. Our results, which differ from previous investigations using other methods, might have impact on how permissive with respect to sequence specificity the protein folding process is -- only sequences with non-random hydrophobicity distributions fold well. Other distributions give rise to energy landscapes with poor folding properties and hence did not survive the evolution.Comment: 16 pages, 8 Postscript figures. Minor changes, references adde

    SEARCHING FOR PALINDROMIC SEQUENCES IN PRIMARY STRUCTURE OF PROTEINS

    Get PDF
    Protein data base SWISSPROT was tested in the search for palindrome sequences in primary structure of polypeptides. The obtained results indicate that palindrome words are present in protein structure and there is a number of them. Half of the length of the longest palindrome was 76 and in accordance with expectations the shorter the length of the palindrome the greater number of them has been determined.Pozna

    Intrinsic flexibility of B-DNA: the experimental TRX scale

    Get PDF
    B-DNA flexibility, crucial for DNA–protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR 31P chemical shifts in solution. These measurements reflect the BI ↔ BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition

    Predicting the binding preference of transcription factors to individual DNA k-mers

    Get PDF
    Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.Canadian Institutes of Health ResearchOntario Research FundNational Institutes of Health (U.S.)National Human Genome Research Institute (U.S.

    Sequential NMR assignments of labile protons in DNA using two-dimensional nuclear-overhauser-enhancemnt spectroscopy with three jump-and-return pulse sequences

    Get PDF
    Two-dimensional nuclear Overhauser enhancement (NOESY) spectra of labile protons were recorded in H2O solutions of a protein and of a DNA duplex, using a modification of the standard NOESY experiment with all three 90° pulses replaced by jump-and-return sequences. For the protein as well as the DNA fragment the strategically important spectral regions could be recorded with good sensitivity and free of artifacts. Using this procedure, sequence-specific assignments were obtained for the imino protons, C2H of adenine, and C4NH2 of cytosine in a 23-base-pair DNA duplex which includes the 17-base-pair OR3 repressor binding site of bacteriophage λ. Based on comparison with previously published results on the isolated OR3 binding site, these data were used for a study of chain termination effects on the chemical shifts of imino proton resonances of DNA duplexes

    Predicting Transcription Factor Specificity with All-Atom Models

    Get PDF
    The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein-DNA complex. We study the PurR E. coli TF, and explore to which extent atomistic models of protein-DNA complexes can be used to distinguish between cognate and non-cognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein-DNA interactions, and weakly influenced by bending of DNA.Comment: 26 pages, 3 figure

    An analysis of the Sargasso Sea resource and the consequences for database composition

    Get PDF
    Background: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.Results: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.Conclusion: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques

    DNA Sequence Determinants Controlling Affinity, Stability and Shape of DNA Complexes Bound by the Nucleoid Protein Fis.

    Get PDF
    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequences in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. The affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions
    corecore