66 research outputs found

    A Novel Alignment-Free Method for Comparing Transcription Factor Binding Site Motifs

    Get PDF
    BACKGROUND: Transcription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity metric for column-to-column comparison and a method to find the optimal position alignment between the two compared motifs. In some applications, alignment-free methods might be preferred; however, few such methods with high accuracy have been described. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe a novel alignment-free method for quantifying the similarity of motifs using their PFMs by converting PFMs into k-mer vectors. The motifs could then be compared by measuring the similarity among their corresponding k-mer vectors. CONCLUSIONS/SIGNIFICANCE: We demonstrate that our method in general achieves similar performance or outperforms the existing methods for clustering motifs according to their binding preference and identifying similar motifs of transcription factors of the same family

    Computational analysis of LexA regulons in Cyanobacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The transcription factor LexA plays an important role in the SOS response in <it>Escherichia coli </it>and many other bacterial species studied. Although the <it>lexA </it>gene is encoded in almost every bacterial group with a wide range of evolutionary distances, its precise functions in each group/species are largely unknown. More recently, it has been shown that <it>lexA </it>genes in two cyanobacterial genomes <it>Nostoc sp</it>. PCC 7120 and <it>Synechocystis sp</it>. PCC 6803 might have distinct functions other than the regulation of the SOS response. To gain a general understanding of the functions of LexA and its evolution in cyanobacteria, we conducted the current study.</p> <p>Results</p> <p>Our analysis indicates that six of 33 sequenced cyanobacterial genomes do not harbor a <it>lexA </it>gene although they all encode the key SOS response genes, suggesting that LexA is not an indispensable transcription factor in these cyanobacteria, and that their SOS responses might be regulated by different mechanisms. Our phylogenetic analysis suggests that <it>lexA </it>was lost during the course of evolution in these six cyanobacterial genomes. For the 26 cyanobacterial genomes that encode a <it>lexA </it>gene, we have predicted their LexA-binding sites and regulons using an efficient binding site/regulon prediction algorithm that we developed previously. Our results show that LexA in most of these 26 genomes might still function as the transcriptional regulator of the SOS response genes as seen in <it>E. coli </it>and other organisms. Interestingly, putative LexA-binding sites were also found in some genomes for some key genes involved in a variety of other biological processes including photosynthesis, drug resistance, etc., suggesting that there is crosstalk between the SOS response and these biological processes. In particular, LexA in both <it>Synechocystis sp. </it>PCC6803 and <it>Gloeobacter violaceus </it>PCC7421 has largely diverged from those in other cyanobacteria in the sequence level. It is likely that LexA is no longer a regulator of the SOS response in <it>Synechocystis sp</it>. PCC6803.</p> <p>Conclusions</p> <p>In most cyanobacterial genomes that we analyzed, LexA appears to function as the transcriptional regulator of the key SOS response genes. There are possible couplings between the SOS response and other biological processes. In some cyanobacteria, LexA has adapted distinct functions, and might no longer be a regulator of the SOS response system. In some other cyanobacteria, <it>lexA </it>appears to have been lost during the course of evolution. The loss of <it>lexA </it>in these genomes might lead to the degradation of its binding sites.</p

    Computational prediction of Pho regulons in cyanobacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phosphorus is an essential element for all life forms. However, it is limiting in most ecological environments where cyanobacteria inhabit. Elucidation of the phosphorus assimilation pathways in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. However, a systematic study of the Pho regulon, the core of the phosphorus assimilation pathway in a cyanobacterium, is hitherto lacking.</p> <p>Results</p> <p>We have predicted and analyzed the Pho regulons in 19 sequenced cyanobacterial genomes using a highly effective scanning algorithm that we have previously developed. Our results show that different cyanobacterial species/ecotypes may encode diverse sets of genes responsible for the utilization of various sources of phosphorus, ranging from inorganic phosphate, phosphodiester, to phosphonates. Unlike in <it>E. coli</it>, some cyanobacterial genes that are directly involved in phosphorus assimilation seem to not be under the regulation of the regulator SphR (orthologue of PhoB in <it>E coli</it>.) in some species/ecotypes. On the other hand, SphR binding sites are found for genes known to play important roles in other biological processes. These genes might serve as bridging points to coordinate the phosphorus assimilation and other biological processes. More interestingly, in three cyanobacterial genomes where no <it>sphR </it>gene is encoded, our results show that there is virtually no functional SphR binding site, suggesting that transcription regulators probably play an important role in retaining their binding sites.</p> <p>Conclusion</p> <p>The Pho regulons in cyanobacteria are highly diversified to accommodate to their respective living environments. The phosphorus assimilation pathways in cyanobacteria are probably tightly coupled to a number of other important biological processes. The loss of a regulator may lead to the rapid loss of its binding sites in a genome.</p

    Comparative genomics analysis of NtcA regulons in cyanobacteria: regulation of nitrogen assimilation and its coupling to photosynthesis

    Get PDF
    We have developed a new method for prediction of cis-regulatory binding sites and applied it to predicting NtcA regulated genes in cyanobacteria. The algorithm rigorously utilizes concurrence information of multiple binding sites in the upstream region of a gene and that in the upstream regions of its orthologues in related genomes. A probabilistic model was developed for the evaluation of prediction reliability so that the prediction false positive rate could be well controlled. Using this method, we have predicted multiple new members of the NtcA regulons in nine sequenced cyanobacterial genomes, and showed that the false positive rates of the predictions have been reduced on an average of 40-fold compared to the conventional methods. A detailed analysis of the predictions in each genome showed that a significant portion of our predictions are consistent with previously published results about individual genes. Intriguingly, NtcA promoters are found for many genes involved in various stages of photosynthesis. Although photosynthesis is known to be tightly coordinated with nitrogen assimilation, very little is known about the underlying mechanism. We postulate for the fist time that these genes serve as the regulatory points to orchestrate these two important processes in a cyanobacterial cell

    Prediction of functional modules based on comparative genome analysis and Gene Ontology application

    Get PDF
    We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives—phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome

    FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.

    Get PDF
    Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also has high accuracy when compared with multiple popular methods. By using FisherMP, we successfully detected combinatorial motifs of CTCF, YY1, MAZ, STAT3 and USF2 in chromosome X, suggesting that they are functional co-players in gene regulation and chromosomal organization. Integrative and statistical analysis of these TF-binding peaks clearly demonstrate that they are not only highly coordinated with each other, but that they are also correlated with histone modifications. FisherMP can be applied for integrative analysis of binding motifs and for predicting cis-regulatory modules from a large number of ChIP-seq datasets

    SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents

    Get PDF
    BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS: A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS: When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS: We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of

    Computational inference and experimental validation of the nitrogen assimilation regulatory network in cyanobacterium Synechococcus sp. WH 8102

    Get PDF
    Deciphering the regulatory networks encoded in the genome of an organism represents one of the most interesting and challenging tasks in the post-genome sequencing era. As an example of this problem, we have predicted a detailed model for the nitrogen assimilation network in cyanobacterium Synechococcus sp. WH 8102 (WH8102) using a computational protocol based on comparative genomics analysis and mining experimental data from related organisms that are relatively well studied. This computational model is in excellent agreement with the microarray gene expression data collected under ammonium-rich versus nitrate-rich growth conditions, suggesting that our computational protocol is capable of predicting biological pathways/networks with high accuracy. We then refined the computational model using the microarray data, and proposed a new model for the nitrogen assimilation network in WH8102. An intriguing discovery from this study is that nitrogen assimilation affects the expression of many genes involved in photosynthesis, suggesting a tight coordination between nitrogen assimilation and photosynthesis processes. Moreover, for some of these genes, this coordination is probably mediated by NtcA through the canonical NtcA promoters in their regulatory regions

    Novel Changes in Discoidal High Density Lipoprotein Morphology: A Molecular Dynamics Study

    Get PDF
    AbstractApoA-I is a uniquely flexible lipid-scavenging protein capable of incorporating phospholipids into stable particles. Here we report molecular dynamics simulations on a series of progressively smaller discoidal high density lipoprotein particles produced by incremental removal of palmitoyloleoylphosphatidylcholine via four different pathways. The starting model contained 160 palmitoyloleoylphosphatidylcholines and a belt of two antiparallel amphipathic helical lipid-associating domains of apolipoprotein (apo) A-I. The results are particularly compelling. After a few nanoseconds of molecular dynamics simulation, independent of the starting particle and method of size reduction, all simulated double belts of the four lipidated apoA-I particles have helical domains that impressively approximate the x-ray crystal structure of lipid-free apoA-I, particularly between residues 88 and 186. These results provide atomic resolution models for two of the particles produced by in vitro reconstitution of nascent high density lipoprotein particles. These particles, measuring 95Å and 78Å by nondenaturing gradient gel electrophoresis, correspond in composition and in size/shape (by negative stain electron microscopy) to the simulated particles with molar ratios of 100:2 and 50:2, respectively. The lipids of the 100:2 particle family form minimal surfaces at their monolayer-monolayer interface, whereas the 50:2 particle family displays a lipid pocket capable of binding a dynamic range of phospholipid molecules
    corecore