85 research outputs found

    Making connections between novel transcription factors and their DNA motifs

    Get PDF
    The key components of a transcriptional regulatory network are the connections between trans-acting transcription factors and cis-acting DNA-binding sites. In spite of several decades of intense research, only a fraction of the estimated āˆ¼300 transcription factors in Escherichia coli have been linked to some of their binding sites in the genome. In this paper, we present a computational method to connect novel transcription factors and DNA motifs in E. coli. Our method uses three types of mutually independent information, two of which are gleaned by comparative analysis of multiple genomes and the third one derived from similarities of transcription-factor-DNA-binding-site interactions. The different types of information are combined to calculate the probability of a given transcription-factor-DNA-motif pair being a true pair. Tested on a study set of transcription factors and their DNA motifs, our method has a prediction accuracy of 59% for the top predictions and 85% for the top three predictions. When applied to 99 novel transcription factors and 70 novel DNA motifs, our method predicted 64 transcription-factor-DNA-motif pairs. Supporting evidence for some of the predicted pairs is presented. Functional annotations are made for 23 novel transcription factors based on the predicted transcription-factor-DNA-motif connections

    SPOCS User Guide

    Get PDF
    SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs, and in addition, html files that provide a visualization of the ortholog/paralog relationships to which gene/protein expression metadata may be overlaid

    PhyloScan: identification of transcription factor binding sites using cross-species evidence

    Get PDF
    BACKGROUND: When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. METHODS: We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. RESULTS: In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. CONCLUSION: Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region

    The tricarboxylic acid cycle in Shewanella oneidensis is independent of Fur and RyhB control

    Get PDF
    Background: It is well established in E. coli and Vibrio cholerae that strains harboring mutations in the ferric uptake regulator gene (fur) are unable to utilize tricarboxylic acid (TCA) compounds, due to the down-regulation of key TCA cycle enzymes, such as AcnA and SdhABCD. This down-regulation is mediated by a Fur-regulated small regulatory RNA named RyhB. It is unclear in the gamma proteobacterium S. oneidensis whether TCA is also regulated by Fur and RyhB. Results: In the present study, we showed that a fur deletion mutant of S. oneidensis could utilize TCA compounds. Consistently, expression of the TCA cycle genes acnA and sdhA was not down-regulated in the mutant. To explore this observation further, we identified a ryhB gene in Shewanella species and experimentally demonstrated the gene expression. Further experiments suggested that RyhB was up-regulated in fur mutant, but that AcnA and SdhA were not controlled by RyhB. Conclusions: These cumulative results delineate an important difference of the Fur-RyhB regulatory cycle between S. oneidensis and other gamma-proteobacteria. This work represents a step forward for understanding the unique regulation in S. oneidensis

    The Gibbs Centroid Sampler

    Get PDF
    The Gibbs Centroid Sampler is a software package designed for locating conserved elements in biopolymer sequences. The Gibbs Centroid Sampler reports a centroid alignment, i.e. an alignment that has the minimum total distance to the set of samples chosen from the a posteriori probability distribution of transcription factor binding-site alignments. In so doing, it garners information from the full ensemble of solutions, rather than only the single most probable point that is the target of many motif-finding algorithms, including its predecessor, the Gibbs Recursive Sampler. Centroid estimators have been shown to yield substantial improvements, in both sensitivity and positive predictive values, to the prediction of RNA secondary structure and motif finding. The Gibbs Centroid Sampler, along with interactive tutorials, an online user manual, and information on downloading the software, is available at: http://bayesweb.wadsworth.org/gibbs/gibbs.html

    Factors affecting the bacterial community composition and heterotrophic production of Columbia River estuarine turbidity maxima

    Get PDF
    Ā© The Author(s), 2017. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in MicrobiologyOpen 6 (2017): e00522, doi:10.1002/mbo3.522.Estuarine turbidity maxima (ETM) function as hotspots of microbial activity and diversity in estuaries, yet, little is known about the temporal and spatial variability in ETM bacterial community composition. To determine which environmental factors affect ETM bacterial populations in the Columbia River estuary, we analyzed ETM bacterial community composition (Sanger sequencing and amplicon pyrosequencing of 16S rRNA gene) and bulk heterotrophic production (3H-leucine incorporation rates). We collected water 20 times to cover five ETM events and obtained 42 samples characterized by different salinities, turbidities, seasons, coastal regimes (upwelling vs. downwelling), locations, and particle size. Spring and summer populations were distinct. All May samples had similar bacterial community composition despite having different salinities (1ā€“24 PSU), but summer non-ETM bacteria separated into marine, freshwater, and brackish assemblages. Summer ETM bacterial communities varied depending on coastal upwelling or downwelling conditions and on the sampling site location with respect to tidal intrusion during the previous neap tide. In contrast to ETM, whole (>0.2 Ī¼m) and free-living (0.2ā€“3 Ī¼m) assemblages of non-ETM waters were similar to each other, indicating that particle-attached (>3 Ī¼m) non-ETM bacteria do not develop a distinct community. Brackish water type (ETM or non-ETM) is thus a major factor affecting particle-attached bacterial communities. Heterotrophic production was higher in particle-attached than free-living fractions in all brackish waters collected throughout the water column during the rise to decline of turbidity through an ETM event (i.e., ETM-impacted waters). However, free-living communities showed higher productivity prior to or after an ETM event (i.e., non-ETM-impacted waters). This study has thus found that Columbia River ETM bacterial communities vary based on seasons, salinity, sampling location, and particle size, with the existence of three particle types characterized by different bacterial communities in ETM, ETM-impacted, and non-ETM-impacted brackish waters. Taxonomic analysis suggests that ETM key biological function is to remineralize organic matter.National Science Foundation Grant Number: OCE-042460

    VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.</p> <p>Results</p> <p>VESPA is a desktop Javaā„¢ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (<it>Yersinia pestis </it>Pestoides F and <it>Synechococcus </it>sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.</p> <p>Conclusions</p> <p>VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at <url>https://www.biopilot.org/docs/Software/Vespa.php</url>.</p

    Comparative systems biology across an evolutionary gradient within the Shewanella genus

    Get PDF
    Author Posting. Ā© The Authors, 2009. This is the author's version of the work. It is posted here by permission of National Academy of Sciences for personal use, not for redistribution. The definitive version was published in Proceedings of the National Academy of Sciences 106 (2009): 15909-15914, doi:10.1073/pnas.0902000106.To what extent genotypic differences translate to phenotypic variation remains a poorly understood issue of paramount importance for several cornerstone concepts of microbiology including the species definition. Here, we take advantage of the completed genomic sequences, expressed proteomic profiles, and physiological studies of ten closely related Shewanella strains and species to provide quantitative insights into this issue. Our analyses revealed that, despite extensive horizontal gene transfer within these genomes, the genotypic and phenotypic similarities among the organisms were generally predictable from their evolutionary relatedness. The power of the predictions depended on the degree of ecological specialization of the organisms evaluated. Using the gradient of evolutionary relatedness formed by these genomes, we were able to partly isolate the effect of ecology from that of evolutionary divergence and rank the different cellular functions in terms of their rates of evolution. Our ranking also revealed that whole-cell protein expression differences among these organisms when grown under identical conditions were relatively larger than differences at the genome level, suggesting that similarity in gene regulation and expression should constitute another important parameter for (new) species description. Collectively, our results provide important new information towards beginning a systems-level understanding of bacterial species and genera.The authors have been supported by the DOE through the Shewanella Federation consortium and the Proteomics Application project. The MSU work relevant to speciation was also supported by NSF (DEB 0516252)
    • ā€¦
    corecore