Search CORE

90 research outputs found

A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database

Author: Boore Jeffrey L
Dehal Paramvir S
Publication venue: BioMed Central
Publication date: 25/08/2005
Field of study

BACKGROUND: We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community. DISCUSSION: The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes. SUMMARY: Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

UNT Digital Library

Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli

Author: Arkin Adam P
Dehal Paramvir S
Price Morgan N
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Most Escherichia coli transcription factors have paralogs, but these usually arose by horizontal gene transfer rather than by duplication within the E. coli lineage, as previously believed

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

UNT Digital Library

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Author: Arkin Adam P.
Dehal Paramvir S.
Price Morgan N.
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings: Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximumlikelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the ‘‘CAT’ ’ approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance: FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes

Author: Adam P Arkin
Morgan N Price
Paramvir S Dehal
Uwe Ohler
Publication venue: Public Library of Science
Publication date: 01/09/2007
Field of study

Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Recommended from our members

Evolutionary Genomics of Life in (and from) the Sea

Author: Boore Jeffrey L.
Dehal Paramvir
Fuerstenberg Susan I.
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 09/01/2006
Field of study

High throughput genome sequencing centers that were originally built for the Human Genome Project (Lander et al., 2001; Venter et al., 2001) have now become an engine for comparative genomics. The six largest centers alone are now producing over 150 billion nucleotides per year, more than 50 times the amount of DNA in the human genome, and nearly all of this is directed at projects that promise great insights into the pattern and processes of evolution. Unfortunately, this data is being produced at a pace far exceeding the capacity of the scientific community to provide insightful analysis, and few scientists with training and experience in evolutionary biology have played prominent roles to date. One of the consequences is that poor quality analyses are typical; for example, orthology among genes is generally determined by simple measures of sequence similarity, when this has been discredited by molecular evolutionary biologists decades ago. Here we discuss the how genomes are chosen for sequencing and how the scientific community can have input. We describe the PhIGs database and web tools (Dehal and Boore 2005a; http://PhIGs.org), which provide phylogenetic analysis of all gene families for all completely sequenced genomes and the associated 'Synteny Viewer', which allows comparisons of the relative positions of orthologous genes. This is the best tool available for inferring gene function across multiple genomes. We also describe how we have used the PhIGs methods with the whole genome sequences of a tunicate, fish, mouse, and human to conclusively demonstrate that two rounds of whole genome duplication occurred at the base of vertebrates (Dehal and Boore 2005b). This evidence is found in the large scale structure of the positions of paralogous genes that arose from duplications inferred by evolutionary analysis to have occurred at the base of vertebrates

UNT Digital Library

RegPrecise web services interface: programmatic access to the transcriptional regulatory interactions in bacteria reconstructed by comparative genomics.

Author: Arkin Adam P
Brettin Thomas S
Dehal Paramvir S
Dubchak Inna
Novichkov Pavel S
Novichkova Elena S
Rodionov Dmitry A
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements

CiteSeerX

PubMed Central

eScholarship - University of California

Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

Author: N. Price Morgan
P. Arkin Adam
S. Dehal Paramvir
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/01/2009
Field of study

Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree

CiteSeerX

PubMed Central

UNT Digital Library

Systematic mapping of two component response regulators to gene targets in a model sulfate reducing bacterium

Author: Arkin Adam P
Dehal Paramvir S
Luning Eric G
Mukhopadhyay Aindrila
Price Morgan N
Rajeev Lara
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BackgroundTwo component regulatory systems are the primary form of signal transduction in bacteria. Although genomic binding sites have been determined for several eukaryotic and bacterial transcription factors, comprehensive identification of gene targets of two component response regulators remains challenging due to the lack of knowledge of the signals required for their activation. We focused our study on Desulfovibrio vulgaris Hildenborough, a sulfate reducing bacterium that encodes unusually diverse and largely uncharacterized two component signal transduction systems.ResultsWe report the first systematic mapping of the genes regulated by all transcriptionally acting response regulators in a single bacterium. Our results enabled functional predictions for several response regulators and include key processes of carbon, nitrogen and energy metabolism, cell motility and biofilm formation, and responses to stresses such as nitrite, low potassium and phosphate starvation. Our study also led to the prediction of new genes and regulatory networks, which found corroboration in a compendium of transcriptome data available for D. vulgaris. For several regulators we predicted and experimentally verified the binding site motifs, most of which were discovered as part of this study.ConclusionsThe gene targets identified for the response regulators allowed strong functional predictions to be made for the corresponding two component systems. By tracking the D. vulgaris regulators and their motifs outside the Desulfovibrio spp. we provide testable hypotheses regarding the functions of orthologous regulators in other organisms. The in vitro array based method optimized here is generally applicable for the study of such systems in all organisms

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Mapping the Two-component Regulatory Networks in Desulfovibrio vulgaris

Author: Dehal Paramvir
Joachimiak Marcin
Luning Eric
Mukhopadhyay Aindrila
Rajeev Lara
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

D. vulgaris Hildenborough has 72 response regulators. The Desulfovibrio are sulfate reducing bacteria that are important in the sulfur and carbon cycles in anoxic habitats. Its large number of two componenent systems are probably critical to its ability to sense and respond to its environment. Our goal is to map these RRs to the genes they regulate using a DNA-affinity-purification-chip (DAP-chip) protocol. First target determined usuing EMSA. A positive target was determined for as many RRs as possible using EMSA. Targets were selected based on gene proximity, regulon predictions and/or predicted sigma54 dependent promoters. qPCR was used to ensure that the target was enriched from sheared genomic DNA before proceeding to the DAP-chip

Crossref

UNT Digital Library

Snapshot of iron response in Shewanella oneidensis by gene network reconstruction

Author: Arkin Adam P
Dehal Paramvir
Harris Daniel P
Jacobsen Janet
Joachimiak Marcin
Luo Feng
Palumbo Anthony V
Wu Liyou
Xiong Wenlu
Yang Yunfeng
Yang Zamin
Zhou Jizhong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Iron homeostasis of <it>Shewanella oneidensis</it>, a γ-proteobacterium possessing high iron content, is regulated by a global transcription factor Fur. However, knowledge is incomplete about other biological pathways that respond to changes in iron concentration, as well as details of the responses. In this work, we integrate physiological, transcriptomics and genetic approaches to delineate the iron response of <it>S. oneidensis</it>. Results We show that the iron response in <it>S. oneidensis </it>is a rapid process. Temporal gene expression profiles were examined for iron depletion and repletion, and a gene co-expression network was reconstructed. Modules of iron acquisition systems, anaerobic energy metabolism and protein degradation were the most noteworthy in the gene network. Bioinformatics analyses suggested that genes in each of the modules might be regulated by DNA-binding proteins Fur, CRP and RpoH, respectively. Closer inspection of these modules revealed a transcriptional regulator (SO2426) involved in iron acquisition and ten transcriptional factors involved in anaerobic energy metabolism. Selected genes in the network were analyzed by genetic studies. Disruption of genes encoding a putative alcaligin biosynthesis protein (SO3032) and a gene previously implicated in protein degradation (SO2017) led to severe growth deficiency under iron depletion conditions. Disruption of a novel transcriptional factor (SO1415) caused deficiency in both anaerobic iron reduction and growth with thiosulfate or TMAO as an electronic acceptor, suggesting that SO1415 is required for specific branches of anaerobic energy metabolism pathways. Conclusion Using a reconstructed gene network, we identified major biological pathways that were differentially expressed during iron depletion and repletion. Genetic studies not only demonstrated the importance of iron acquisition and protein degradation for iron depletion, but also characterized a novel transcriptional factor (SO1415) with a role in anaerobic energy metabolism.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central