12 research outputs found

    Position dependencies in transcription factor binding sites

    Get PDF
    Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction. Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor-DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor-DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool. Availability: http://promoterplot.fmi.ch/cgi-bin/dep.html Contact: [email protected] Supplementary information: Supplementary data (1, 2, 3, 4, 5, 6, 7 and 8) are available at Bioinformatics onlin

    Quality estimation of multiple sequence alignments by Bayesian hypothesis testing

    Get PDF
    Summary: In this work we present a web-based tool for estimating multiple alignment quality using Bayesian hypothesis testing. The proposed method is very simple, easily implemented and not time consuming with a linear complexity. We evaluated method against a series of different alignments (a set of random and biologically derived alignments) and compared the results with tools based on classical statistical methods (such as sFFT and csFFT). Taking correlation coefficient as an objective criterion of the true quality, we found that Bayesian hypothesis testing performed better on average than the classical methods we tested. This approach may be used independently or as a component of any tool in computational biology which is based on the statistical estimation of alignment quality. Availability: http://www.fmi.ch/groups/functional.genomics/tool.htm Contact: [email protected] Supplementary information: Supplementary data are available from http://www.fmi.ch/groups/functional.genomics/tool-Supp.ht

    Transcription factor site dependencies in human, mouse and rat genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known that transcription factors frequently act together to regulate gene expression in eukaryotes. In this paper we describe a computational analysis of transcription factor site dependencies in human, mouse and rat genomes.</p> <p>Results</p> <p>Our approach for quantifying tendencies of transcription factor binding sites to co-occur is based on a binding site scoring function which incorporates dependencies between positions, the use of information about the structural class of each transcription factor (major/minor groove binder), and also considered the possible implications of varying GC content of the sequences. Significant tendencies (dependencies) have been detected by non-parametric statistical methodology (permutation tests). Evaluation of obtained results has been performed in several ways: reports from literature (many of the significant dependencies between transcription factors have previously been confirmed experimentally); dependencies between transcription factors are not biased due to similarities in their DNA-binding sites; the number of dependent transcription factors that belong to the same functional and structural class is significantly higher than would be expected by chance; supporting evidence from GO clustering of targeting genes. Based on dependencies between two transcription factor binding sites (second-order dependencies), it is possible to construct higher-order dependencies (networks). Moreover results about transcription factor binding sites dependencies can be used for prediction of groups of dependent transcription factors on a given promoter sequence. Our results, as well as a scanning tool for predicting groups of dependent transcription factors binding sites are available on the Internet.</p> <p>Conclusion</p> <p>We show that the computational analysis of transcription factor site dependencies is a valuable complement to experimental approaches for discovering transcription regulatory interactions and networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions.</p

    Computational Structural Analysis: Multiple Proteins Bound to DNA

    Get PDF
    BACKGROUND: With increasing numbers of crystal structures of proteinratioDNA and proteinratioproteinratioDNA complexes publically available, it is now possible to extract sufficient structural, physical-chemical and thermodynamic parameters to make general observations and predictions about their interactions. In particular, the properties of macromolecular assemblies of multiple proteins bound to DNA have not previously been investigated in detail. METHODOLOGY/PRINCIPAL FINDINGS: We have performed computational structural analyses on macromolecular assemblies of multiple proteins bound to DNA using a variety of different computational tools: PISA; PROMOTIF; X3DNA; ReadOut; DDNA and DCOMPLEX. Additionally, we have developed and employed an algorithm for approximate collision detection and overlapping volume estimation of two macromolecules. An implementation of this algorithm is available at http://promoterplot.fmi.ch/Collision1/. The results obtained are compared with structural, physical-chemical and thermodynamic parameters from proteinratioprotein and single proteinratioDNA complexes. Many of interface properties of multiple proteinratioDNA complexes were found to be very similar to those observed in binary proteinratioDNA and proteinratioprotein complexes. However, the conformational change of the DNA upon protein binding is significantly higher when multiple proteins bind to it than is observed when single proteins bind. The water mediated contacts are less important (found in less quantity) between the interfaces of components in ternary (proteinratioproteinratioDNA) complexes than in those of binary complexes (proteinratioprotein and proteinratioDNA).The thermodynamic stability of ternary complexes is also higher than in the binary interactions. Greater specificity and affinity of multiple proteins binding to DNA in comparison with binary protein-DNA interactions were observed. However, protein-protein binding affinities are stronger in complexes without the presence of DNA. CONCLUSIONS/SIGNIFICANCE: Our results indicate that the interface properties: interface area; number of interface residues/atoms and hydrogen bonds; and the distribution of interface residues, hydrogen bonds, van der Walls contacts and secondary structure motifs are independent of whether or not a protein is in a binary or ternary complex with DNA. However, changes in the shape of the DNA reduce the off-rate of the proteins which greatly enhances the stability and specificity of ternary complexes compared to binary ones

    Computational analysis of promoters and DNA-protein interactions

    Get PDF
    The investigation of promoter activity and DNA-protein interactions is very important for understanding many crucial cellular processes, including transcription, recombination and replication. Promoter activity and DNA-protein interactions can be studied in the lab (in vitro or in vivo) or using computational methods (in silico). Computational approaches for analysing promoters and DNA-protein interactions have become more powerful as more and more complete genome sequences, 3D structural data, and high-throughput data (such as ChIP-chip and expression data) have become available. Modern scientific research into promoters and DNA-protein interactions represents a high level of cooperation between computational and laboratorial methods. This thesis covers several aspects of the computational analysis of promoters and DNAprotein interactions: analysis of transcription factor binding sites (investigating position dependencies in transcription factor binding sties); computational prediction of transcription factor binding sites (a new scanning method for the in silico prediction of transcription factor binding sites is described); computational analysis of crystal structures of DNA-protein interactions (multiple proteins bound to DNA); and computational predictions of transcription factor co-operations (investigating dependencies between transcription factors in human, mouse and rat genomes, and a new method of in silico prediction of cis-regulatory motifs and transcription start sites is described). In addition, this thesis reports how one statistical method for the analysis of transcription factor binding sites can be used for estimating the quality of multiple sequence alignments. The main finding reported in this thesis is that it is wrong to assume, a priori, that positions in transcription factor binding sites are all either independent or dependent on one another. Position dependencies should be tested using rigorous statistical methods on a case-by-case basis. When dependencies are detected, they can be modelled in a very simple way, which doesn’t require complex mathematical tools with a lot of parameters and more data. An example of such a model, including a web-based implementation of the algorithm, is reported in this thesis. It has also been shown that the conformational energy (indirect readout) of DNA in complexes with transcription factors which have dependent positions in their binding sites is significant ly higher than in those with transcription factors which do not have dependent positions in their binding sites. The structural analysis of multiple protein-DNA interactions showed that the formation of interactions between multiple proteins and DNA results in a decrease in proteinprotein affinity and an increase in protein-DNA affinity, with a net gain in overall stability of complexes where multiple proteins are bound to DNA. This effect is clearly important for modelling transcription factor co-operativity. In addition, the physical overlap of two factors does not simply relate to the region on the DNA where the binding site is found. Two factors may lie very close together but possibly not physically overlap because their side-chains can interlink with one another. In this way, it is possible to find a large overlap between two transcription factor binding sites, but from a 3D perspective it is still possible for both factors to bind simultaneously. It may also be that one transcription factor binds to the minor and another to the major groove of DNA. That information is also useful for modelling transcription factor co-operativity. Moreover, this thesis reports the results from a computational prediction of dependencies (co-operativities) between transcription factors which usually act together in gene regulation in human, mouse and rat genomes. It is shown that that the computational analysis of transcription factor site dependencies is a valuable complement to experimental approaches for discovering transcription regulatory interactions and networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions. Finally, it has been demonstrated that modelling transcription factor co-operativities improves the quality of transcription start site predictions. For three genes (ctmp, gap-43 and ngfrap) in-vivo validation of the predicted transcription start sites is performed. Finally, the Bayesian method for the detection of dependencies between positions in transcription factor binding sites can easily be converted into a method for estimating the quality of multiple sequence alignments. That method is simple, linear complexity, which is easy to implement and which performs better than other state-of-the-art methods which are more complex
    corecore