71 research outputs found

    Barcodes for genomes and applications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Each genome has a stable distribution of the combined frequency for each <it>k</it>-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these <it>k</it>-mer frequency distributions is unique to each genome and termed the genome's <it>barcode</it>.</p> <p>Results</p> <p>We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for addressing two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness.</p> <p>Conclusion</p> <p>These and other properties of genomes barcodes make them a new and effective tool for studying numerous genome and metagenome analysis problems.</p

    Insertion Sequences show diverse recent activities in Cyanobacteria and Archaea

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mobile genetic elements (MGEs) play an essential role in genome rearrangement and evolution, and are widely used as an important genetic tool.</p> <p>Results</p> <p>In this article, we present genetic maps of recently active <it>Insertion Sequence </it>(IS) elements, the simplest form of MGEs, for all sequenced cyanobacteria and archaea, predicted based on the previously identified ~1,500 IS elements. Our predicted IS maps are consistent with the NCBI annotations of the IS elements. By linking the predicted IS elements to various characteristics of the organisms under study and the organism's living conditions, we found that (a) the activities of IS elements heavily depend on the environments where the host organisms live; (b) the number of recently active IS elements in a genome tends to increase with the genome size; (c) the flanking regions of the recently active IS elements are significantly enriched with genes encoding DNA binding factors, transporters and enzymes; and (d) IS movements show no tendency to disrupt operonic structures.</p> <p>Conclusion</p> <p>This is the first genome-scale maps of IS elements with detailed structural information on the sequence level. These genetic maps of recently active IS elements and the several interesting observations would help to improve our understanding of how IS elements proliferate and how they are involved in the evolution of the host genomes.</p

    Computational prediction of Pho regulons in cyanobacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phosphorus is an essential element for all life forms. However, it is limiting in most ecological environments where cyanobacteria inhabit. Elucidation of the phosphorus assimilation pathways in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. However, a systematic study of the Pho regulon, the core of the phosphorus assimilation pathway in a cyanobacterium, is hitherto lacking.</p> <p>Results</p> <p>We have predicted and analyzed the Pho regulons in 19 sequenced cyanobacterial genomes using a highly effective scanning algorithm that we have previously developed. Our results show that different cyanobacterial species/ecotypes may encode diverse sets of genes responsible for the utilization of various sources of phosphorus, ranging from inorganic phosphate, phosphodiester, to phosphonates. Unlike in <it>E. coli</it>, some cyanobacterial genes that are directly involved in phosphorus assimilation seem to not be under the regulation of the regulator SphR (orthologue of PhoB in <it>E coli</it>.) in some species/ecotypes. On the other hand, SphR binding sites are found for genes known to play important roles in other biological processes. These genes might serve as bridging points to coordinate the phosphorus assimilation and other biological processes. More interestingly, in three cyanobacterial genomes where no <it>sphR </it>gene is encoded, our results show that there is virtually no functional SphR binding site, suggesting that transcription regulators probably play an important role in retaining their binding sites.</p> <p>Conclusion</p> <p>The Pho regulons in cyanobacteria are highly diversified to accommodate to their respective living environments. The phosphorus assimilation pathways in cyanobacteria are probably tightly coupled to a number of other important biological processes. The loss of a regulator may lead to the rapid loss of its binding sites in a genome.</p

    Comparative genomics analysis of NtcA regulons in cyanobacteria: regulation of nitrogen assimilation and its coupling to photosynthesis

    Get PDF
    We have developed a new method for prediction of cis-regulatory binding sites and applied it to predicting NtcA regulated genes in cyanobacteria. The algorithm rigorously utilizes concurrence information of multiple binding sites in the upstream region of a gene and that in the upstream regions of its orthologues in related genomes. A probabilistic model was developed for the evaluation of prediction reliability so that the prediction false positive rate could be well controlled. Using this method, we have predicted multiple new members of the NtcA regulons in nine sequenced cyanobacterial genomes, and showed that the false positive rates of the predictions have been reduced on an average of 40-fold compared to the conventional methods. A detailed analysis of the predictions in each genome showed that a significant portion of our predictions are consistent with previously published results about individual genes. Intriguingly, NtcA promoters are found for many genes involved in various stages of photosynthesis. Although photosynthesis is known to be tightly coordinated with nitrogen assimilation, very little is known about the underlying mechanism. We postulate for the fist time that these genes serve as the regulatory points to orchestrate these two important processes in a cyanobacterial cell

    Prediction of functional modules based on comparative genome analysis and Gene Ontology application

    Get PDF
    We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives—phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome

    A Comparative Analysis of Gene-Expression Data of Multiple Cancer Types

    Get PDF
    A comparative study of public gene-expression data of seven types of cancers (breast, colon, kidney, lung, pancreatic, prostate and stomach cancers) was conducted with the aim of deriving marker genes, along with associated pathways, that are either common to multiple types of cancers or specific to individual cancers. The analysis results indicate that (a) each of the seven cancer types can be distinguished from its corresponding control tissue based on the expression patterns of a small number of genes, e.g., 2, 3 or 4; (b) the expression patterns of some genes can distinguish multiple cancer types from their corresponding control tissues, potentially serving as general markers for all or some groups of cancers; (c) the proteins encoded by some of these genes are predicted to be blood secretory, thus providing potential cancer markers in blood; (d) the numbers of differentially expressed genes across different cancer types in comparison with their control tissues correlate well with the five-year survival rates associated with the individual cancers; and (e) some metabolic and signaling pathways are abnormally activated or deactivated across all cancer types, while other pathways are more specific to certain cancers or groups of cancers. The novel findings of this study offer considerable insight into these seven cancer types and have the potential to provide exciting new directions for diagnostic and therapeutic development

    DOOR: a database for prokaryotic operons

    Get PDF
    We present a database DOOR (Database for prOkaryotic OpeRons) containing computationally predicted operons of all the sequenced prokaryotic genomes. All the operons in DOOR are predicted using our own prediction program, which was ranked to be the best among 14 operon prediction programs by a recent independent review. Currently, the DOOR database contains operons for 675 prokaryotic genomes, and supports a number of search capabilities to facilitate easy access and utilization of the information stored in it. Querying the database: the database provides a search capability for a user to find desired operons and associated information through multiple querying methods.Searching for similar operons: the database provides a search capability for a user to find operons that have similar composition and structure to a query operon.Prediction of cis-regulatory motifs: the database provides a capability for motif identification in the promoter regions of a user-specified group of possibly coregulated operons, using motif-finding tools.Operons for RNA genes: the database includes operons for RNA genes.OperonWiki: the database provides a wiki page (OperonWiki) to facilitate interactions between users and the developer of the database. We believe that DOOR provides a useful resource to many biologists working on bacteria and archaea, which can be accessed at http://csbl1.bmb.uga.edu/OperonDB

    Computational prediction of the osmoregulation network in Synechococcus sp. WH8102

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Osmotic stress is caused by sudden changes in the impermeable solute concentration around a cell, which induces instantaneous water flow in or out of the cell to balance the concentration. Very little is known about the detailed response mechanism to osmotic stress in marine <it>Synechococcus</it>, one of the major oxygenic phototrophic cyanobacterial genera that contribute greatly to the global CO<sub>2 </sub>fixation.</p> <p>Results</p> <p>We present here a computational study of the osmoregulation network in response to hyperosmotic stress of <it>Synechococcus sp </it>strain <it>WH8102 </it>using comparative genome analyses and computational prediction. In this study, we identified the key transporters, synthetases, signal sensor proteins and transcriptional regulator proteins, and found experimentally that of these proteins, 15 genes showed significantly changed expression levels under a mild hyperosmotic stress.</p> <p>Conclusions</p> <p>From the predicted network model, we have made a number of interesting observations about <it>WH8102</it>. Specifically, we found that (i) the organism likely uses glycine betaine as the major osmolyte, and others such as glucosylglycerol, glucosylglycerate, trehalose, sucrose and arginine as the minor osmolytes, making it efficient and adaptable to its changing environment; and (ii) σ<sup>38</sup>, one of the seven types of σ factors, probably serves as a global regulator coordinating the osmoregulation network and the other relevant networks.</p

    Computational inference and experimental validation of the nitrogen assimilation regulatory network in cyanobacterium Synechococcus sp. WH 8102

    Get PDF
    Deciphering the regulatory networks encoded in the genome of an organism represents one of the most interesting and challenging tasks in the post-genome sequencing era. As an example of this problem, we have predicted a detailed model for the nitrogen assimilation network in cyanobacterium Synechococcus sp. WH 8102 (WH8102) using a computational protocol based on comparative genomics analysis and mining experimental data from related organisms that are relatively well studied. This computational model is in excellent agreement with the microarray gene expression data collected under ammonium-rich versus nitrate-rich growth conditions, suggesting that our computational protocol is capable of predicting biological pathways/networks with high accuracy. We then refined the computational model using the microarray data, and proposed a new model for the nitrogen assimilation network in WH8102. An intriguing discovery from this study is that nitrogen assimilation affects the expression of many genes involved in photosynthesis, suggesting a tight coordination between nitrogen assimilation and photosynthesis processes. Moreover, for some of these genes, this coordination is probably mediated by NtcA through the canonical NtcA promoters in their regulatory regions

    Operon prediction using both genome-specific and general genomic information

    Get PDF
    We have carried out a systematic analysis of the contribution of a set of selected features that include three new features to the accuracy of operon prediction. Our analyses have led to a number of new insights about operon prediction, including that (i) different features have different levels of discerning power when used on adjacent gene pairs with different ranges of intergenic distance, (ii) certain features are universally useful for operon prediction while others are more genome-specific and (iii) the prediction reliability of operons is dependent on intergenic distances. Based on these new insights, our newly developed operon-prediction program achieves more accurate operon prediction than the previous ones, and it uses features that are most readily available from genomic sequences. Our prediction results indicate that our (non-linear) decision tree-based classifier can predict operons in a prokaryotic genome very accurately when a substantial number of operons in the genome are already known. For example, the prediction accuracy of our program can reach 90.2 and 93.7% on Bacillus subtilis and Escherichia coli genomes, respectively. When no such information is available, our (linear) logistic function-based classifier can reach the prediction accuracy at 84.6 and 83.3% for E.coli and B.subtilis, respectively
    corecore