8 research outputs found

    Gene Context Analysis in the Integrated Microbial Genomes (IMG) Data Management System

    Get PDF
    Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov

    Phylogenetic detection of conserved gene clusters in microbial genomes

    Get PDF
    BACKGROUND: Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as vertical inheritance, or functional selection. Thus, identification of conserved gene clusters that are under functional selection provides an effective channel for gene annotation, microarray screening, and pathway reconstruction. The problem of devising a robust method to identify these conserved gene clusters and to evaluate the significance of the conservation in multiple genomes has a number of implications for comparative, evolutionary and functional genomics as well as synthetic biology. RESULTS: In this paper we describe a new method for detecting conserved gene clusters that incorporates the information captured by a genome phylogenetic tree. We show that our method can overcome the common problem of overestimation of significance due to the bias in the genome database and thereby achieve better accuracy when detecting functionally connected gene clusters. Our results can be accessed at database GeneChords . CONCLUSION: The methodology described in this paper gives a scalable framework for discovering conserved gene clusters in microbial genomes. It serves as a platform for many other functional genomic analyses in microorganisms, such as operon prediction, regulatory site prediction, functional annotation of genes, evolutionary origin and development of gene clusters

    Inferring functional modules of protein families with probabilistic topic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p

    Relationship between operon preference and functional properties of persistent genes in bacterial genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genes in bacteria may be organised into operons, leading to strict co-expression of the genes that participate in the same operon. However, comparisons between different bacterial genomes have shown that much of the operon structure is dynamic on an evolutionary time scale. This indicates that there are opposing effects influencing the tendency for operon formation, and these effects may be reflected in properties like evolutionary rate, complex formation, metabolic pathways and gene fusion.</p> <p>Results</p> <p>We have used multi-species protein-protein comparisons to generate a high-quality set of genes that are persistent in bacterial genomes (i.e. they have close to universal distribution). We have analysed these genes with respect to operon participation and important functional properties, including evolutionary rate and protein-protein interactions.</p> <p>Conclusions</p> <p>Genes for ribosomal proteins show a very slow rate of evolution. This is consistent with a strong tendency for the genes to participate in operons and for their proteins to be involved in essential and well defined complexes. Persistent genes for non-ribosomal proteins can be separated into two classes according to tendency to participate in operons. Those with a strong tendency for operon participation make proteins with fewer interaction partners that seem to participate in relatively static complexes and possibly linear pathways. Genes with a weak tendency for operon participation tend to produce proteins with more interaction partners, but possibly in more dynamic complexes and convergent pathways. Genes that are not regulated through operons are therefore more evolutionary constrained than the corresponding operon-associated genes and will on average evolve more slowly.</p

    Sequence Analysis of the Bacterial Protein Elongation Factor P

    Get PDF
    In 1975, Elongation Factor P (EF-P) protein was first discovered in the bacterium Escherichia coli. EF-P is believed to facilitate the translation of proteins by stimulating peptide bond synthesis for a number of different aminoacyl-tRNA molecules in conjunction with the 70S ribosome peptidyl transferase. Known eukaryotic homologs, eukaryotic translation initiation factor 5A (eIF-5A) of EF-P exist but with very low sequence conservation. Nevertheless, because of the high sequence similarities seen between bacterial EF-Ps and its low sequence similarity with eIF-5A, there is interest in the pharmaceutical industry of developing a novel antibacterial drug that inhibits EF-P. Of 322 completely sequenced bacterial genomes stored in GenBank, only one organism lacked an EF-P protein. Interestingly, sixty-six genomes were discovered to carry a duplicate copy of efp. The EF-P sequences were then used to construct a protein phylogenetic tree, which provided evidence of horizontal and vertical gene transfer as well as gene duplication. To lend support to these findings, EF-P GC content, codon usage, and nucleotide and amino acid sequences were analyzed with positive and negative controls. The adjacent 10 kb upstream and downstream regions of efp were also retrieved to determine if gene order is conserved in distantly related species. While gene order was not preserved in all species, two interesting trends were seen in some of the distantly related species. The EF-P gene was conserved beside Acetyl-CoA carboxylase genes, accB and accC in certain organisms. In addition, some efp sequences were flanked by two insertion sequence elements. Evidence of gene duplication and horizontal transfers of regions were also observed in the upstream and downstream regions of efp. In combination, phylogenetic, sequence analyses, and gene order conservation confirmed evidence of the complex history of the efp genes, which showed incongruencies relative to the universal phylogenetic tree. To determine how efp is regulated, the upstream regions of efp were used to try to predict motifs in silico. While statistically significant motifs were discovered in the upstream regions of the orthologous efp genes, no conclusive similarities to known binding sites such as the sigma factor binding sites or regulatory protein binding sites were observed. This work may facilitate and enhance the understanding of the regulation, conservation, and role of EF-P in protein translation

    Mining a Chinese hyperthermophilic metagenome

    Get PDF
    Philosophiae Doctor - PhDMetagenomic sequencing of environmental samples provide direct access to genomic information of organisms within the respective environments. This sequence information represents a significant resource for the identification and subsequent characterization of potentially novel genes, or known genes with acquired novel characteristics. Within this context, the thermophilic environments are of particular interest due to its potential for deriving novel thermostable enzymes with biotechnological and industrial applications. In this work metagenomic library construction, random sequencing and sequence analysis strategies were employed to enhance identification and characterisation of potentially novel genes, from a thermophilic soil sample. High molecular weight metagenomic DNA was extracted from two Chinese hydrothermal soil samples. This was used as source material for the construction of four genomic DNA libraries. The combined libraries were estimated to contain in the order of 1.3 million genes, which provides a rich resource for gene identification. Approximately 70 kbp of sequence data was generated from one of the libraries as a resource for sequence-based analysis. Initial BLAST analysis predicted the presence of 53 ORFs/partial ORFs. The BLAST similarity scores for the investigated ORFs were sufficiently high (>40%) to infer homology with database proteins while also being indicative of novel sequence variants of these database matches. In an attempt to enhance the potential for deriving more full length ORFs a novel strategy, based on WGA technology, was employed. This resulted in the recovery of the near complete sequence of partial ORF5, directly from the WGA DNA of the environmental sample. While the full length ORF5 could not be recovered, the feasibility of this novel approach, for enhanced metagenomic sequence recovery was proved in principle. The implementation of multiple insilico strategies resulted in the identification of two ORFs, classified as homologs of the DUF29 and Usp protein families respectively. The functional inference obtained from the integrated in-silico predictions was furthermore highly suggestive of a putative nucleotide binding/interaction role for both ORFs. A putative novel DNA polymerase gene (denoted TC11pol) was identified from the sequence data. Expression and characterization of the full length TC11pol did however not result in detectable polymerase activity. The implementation of a homology modeling approach proved succesfull for deriving a structural model of the polymerase that was used for: (i) deriving functional inferences of the potential activities of the polymerase and (ii) deriving a 5’ exonuclease deletion mutant for functional analysis. Expression and subsequent functional characterization of the putative 5’exo- TC11pol mutant resulted in detectable polymerase and 3’-5’ exonuclease activity at 37 and 45 oC, following a heat denaturation step at 55 oC for 1 hour. It was, therefore concluded that the putative 5’exo- TC11pol mutant was functionally equivalent to the Klenow fragment of E. coli, while exhibiting increased thermostability.South Afric

    Investigation of STM3071 as a potential regulator of cobalt transport in Salmonella enterica

    Get PDF
    Using bioinformatics we have identified stm3071 as a possible regulator of anaerobically induced genes involved in metal homeostasis (Price-Carter et al., 2001) and the aim of this study is to determine the function of stm3071 and define the conditions that induce its expression. Cobalt is required for incorporation into cobalamin (vitamin B12) which is important during S. Typhimurium infection. Vitamin B12 is synthesised de novo under anaerobic conditions and is required for metabolism of 1,2-propanediol and ethanolamine which act as sources of carbon and nitrogen when Salmonella is in the gut (Raux et al., 1996; Thiennimitr et al., 2011). Therefore, sensing Co2+ from the environment, and maintaining Co2+ homeostasis, to avoid metal-mediated toxicity, is required for vitamin B12 biosynthesis.Using ?-red based mutagenesis we have constructed a deletion mutant in order to investigate the function of stm3071. We examined the effect of mutation on the utilisation of 1,2-propanediol under anaerobic conditions and ability to produce vitamin B12. We have also tested the effect of mutation on tolerance to cobalt both aerobically and anaerobically. In order to monitor conditions in which Pstm3071 is switched on, a Pstm3071::lacZ transcriptional fusion was constructed in plasmid pRS415. Levels of ?-galactosidase activity were measured in the presence of cobalt in both ?stm3071 and SL1344 (wild type strain) under anaerobic conditions.Anaerobic growth experiments and B12 assays showed that stm3071 is not essential for growth or synthesis of vitamin B12. In addition, cobalt tolerance in both aerobic and anaerobic conditions was unaffected. However, as measured by ?-galactosidase assay, our data suggests that Pstm3071 expression is induced in the presence of cobalt in the deletion mutant. In contrast, we observed no difference in expression of Pstm3071 in the presence or absence of cobalt in SL1344
    corecore