90 research outputs found

    A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database

    Get PDF
    BACKGROUND: We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community. DISCUSSION: The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes. SUMMARY: Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses

    Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli

    Get PDF
    Most Escherichia coli transcription factors have paralogs, but these usually arose by horizontal gene transfer rather than by duplication within the E. coli lineage, as previously believed

    FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

    Get PDF
    Background: We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings: Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximumlikelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the ‘‘CAT’ ’ approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance: FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments

    Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes

    Get PDF
    Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought

    RegPrecise web services interface: programmatic access to the transcriptional regulatory interactions in bacteria reconstructed by comparative genomics.

    Get PDF
    Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements

    Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

    Get PDF
    Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree

    Systematic mapping of two component response regulators to gene targets in a model sulfate reducing bacterium

    Get PDF
    BackgroundTwo component regulatory systems are the primary form of signal transduction in bacteria. Although genomic binding sites have been determined for several eukaryotic and bacterial transcription factors, comprehensive identification of gene targets of two component response regulators remains challenging due to the lack of knowledge of the signals required for their activation. We focused our study on Desulfovibrio vulgaris Hildenborough, a sulfate reducing bacterium that encodes unusually diverse and largely uncharacterized two component signal transduction systems.ResultsWe report the first systematic mapping of the genes regulated by all transcriptionally acting response regulators in a single bacterium. Our results enabled functional predictions for several response regulators and include key processes of carbon, nitrogen and energy metabolism, cell motility and biofilm formation, and responses to stresses such as nitrite, low potassium and phosphate starvation. Our study also led to the prediction of new genes and regulatory networks, which found corroboration in a compendium of transcriptome data available for D. vulgaris. For several regulators we predicted and experimentally verified the binding site motifs, most of which were discovered as part of this study.ConclusionsThe gene targets identified for the response regulators allowed strong functional predictions to be made for the corresponding two component systems. By tracking the D. vulgaris regulators and their motifs outside the Desulfovibrio spp. we provide testable hypotheses regarding the functions of orthologous regulators in other organisms. The in vitro array based method optimized here is generally applicable for the study of such systems in all organisms

    Mapping the Two-component Regulatory Networks in Desulfovibrio vulgaris

    Full text link
    D. vulgaris Hildenborough has 72 response regulators. The Desulfovibrio are sulfate reducing bacteria that are important in the sulfur and carbon cycles in anoxic habitats. Its large number of two componenent systems are probably critical to its ability to sense and respond to its environment. Our goal is to map these RRs to the genes they regulate using a DNA-affinity-purification-chip (DAP-chip) protocol. First target determined usuing EMSA. A positive target was determined for as many RRs as possible using EMSA. Targets were selected based on gene proximity, regulon predictions and/or predicted sigma54 dependent promoters. qPCR was used to ensure that the target was enriched from sheared genomic DNA before proceeding to the DAP-chip

    Snapshot of iron response in Shewanella oneidensis by gene network reconstruction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Iron homeostasis of <it>Shewanella oneidensis</it>, a γ-proteobacterium possessing high iron content, is regulated by a global transcription factor Fur. However, knowledge is incomplete about other biological pathways that respond to changes in iron concentration, as well as details of the responses. In this work, we integrate physiological, transcriptomics and genetic approaches to delineate the iron response of <it>S. oneidensis</it>.</p> <p>Results</p> <p>We show that the iron response in <it>S. oneidensis </it>is a rapid process. Temporal gene expression profiles were examined for iron depletion and repletion, and a gene co-expression network was reconstructed. Modules of iron acquisition systems, anaerobic energy metabolism and protein degradation were the most noteworthy in the gene network. Bioinformatics analyses suggested that genes in each of the modules might be regulated by DNA-binding proteins Fur, CRP and RpoH, respectively. Closer inspection of these modules revealed a transcriptional regulator (SO2426) involved in iron acquisition and ten transcriptional factors involved in anaerobic energy metabolism. Selected genes in the network were analyzed by genetic studies. Disruption of genes encoding a putative alcaligin biosynthesis protein (SO3032) and a gene previously implicated in protein degradation (SO2017) led to severe growth deficiency under iron depletion conditions. Disruption of a novel transcriptional factor (SO1415) caused deficiency in both anaerobic iron reduction and growth with thiosulfate or TMAO as an electronic acceptor, suggesting that SO1415 is required for specific branches of anaerobic energy metabolism pathways.</p> <p>Conclusion</p> <p>Using a reconstructed gene network, we identified major biological pathways that were differentially expressed during iron depletion and repletion. Genetic studies not only demonstrated the importance of iron acquisition and protein degradation for iron depletion, but also characterized a novel transcriptional factor (SO1415) with a role in anaerobic energy metabolism.</p
    corecore