28 research outputs found
Ebolavirus comparative genomics
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).Fil: Jun, Se Ran. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados UnidosFil: Leuze, Michael R.. Oak Ridge National Laboratory; Estados UnidosFil: Nookaew, Intawat. Oak Ridge National Laboratory; Estados UnidosFil: Uberbacher, Edward C.. Oak Ridge National Laboratory; Estados UnidosFil: Land, Miriam. Oak Ridge National Laboratory; Estados UnidosFil: Zhang, Qian. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados UnidosFil: Wanchai, Visanu. Oak Ridge National Laboratory; Estados UnidosFil: Chai, Juanjuan. Oak Ridge National Laboratory; Estados UnidosFil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas ; ArgentinaFil: Trolle, Thomas. Technical University of Denmark; DinamarcaFil: Lund, Ole. Technical University of Denmark; DinamarcaFil: Buzard, Gregory S.. Booze Allen Hamilton; Estados UnidosFil: Pedersen, Thomas D.. Technical University of Denmark; Dinamarca. Assays; DinamarcaFil: Wassenaar, Trudy M.. Molecular Microbiology and Genomics Consultants; AlemaniaFil: Ussery, David W.. Oak Ridge National Laboratory; Estados Unidos. University of Tennessee; Estados Unidos. Technical University of Denmark; Dinamarc
Shewanella knowledgebase: integration of the experimental data and computational predictions suggests a biological role for transcription of intergenic regions
Shewanellae are facultative γ-proteobacteria whose remarkable respiratory versatility has resulted in interest in their utility for bioremediation of heavy metals and radionuclides and for energy generation in microbial fuel cells. Extensive experimental efforts over the last several years and the availability of 21 sequenced Shewanella genomes made it possible to collect and integrate a wealth of information on the genus into one public resource providing new avenues for making biological discoveries and for developing a system level understanding of the cellular processes. The Shewanella knowledgebase was established in 2005 to provide a framework for integrated genome-based studies on Shewanella ecophysiology. The present version of the knowledgebase provides access to a diverse set of experimental and genomic data along with tools for curation of genome annotations and visualization and integration of genomic data with experimental data. As a demonstration of the utility of this resource, we examined a single microarray data set from Shewanella oneidensis MR-1 for new insights into regulatory processes. The integrated analysis of the data predicted a new type of bacterial transcriptional regulation involving co-transcription of the intergenic region with the downstream gene and suggested a biological role for co-transcription that likely prevents the binding of a regulator of the upstream gene to the regulator binding site located in the intergenic region
Phenotype Fingerprinting Suggests the Involvement of Single-Genotype Consortia in Degradation of Aromatic Compounds by Rhodopseudomonas palustris
Anaerobic degradation of complex organic compounds by microorganisms is crucial for development of innovative biotechnologies for bioethanol production and for efficient degradation of environmental pollutants. In natural environments, the degradation is usually accomplished by syntrophic consortia comprised of different bacterial species. This strategy allows consortium organisms to reduce efforts required for maintenance of the redox homeostasis at each syntrophic level. Cellular mechanisms that maintain the redox homeostasis during the degradation of aromatic compounds by one organism are not fully understood. Here we present a hypothesis that the metabolically versatile phototrophic bacterium Rhodopseudomonas palustris forms its own syntrophic consortia, when it grows anaerobically on p-coumarate or benzoate as a sole carbon source. We have revealed the consortia from large-scale measurements of mRNA and protein expressions under p-coumarate, benzoate and succinate degrading conditions using a novel computational approach referred as phenotype fingerprinting. In this approach, marker genes for known R. palustris phenotypes are employed to determine the relative expression levels of genes and proteins in aromatics versus non-aromatics degrading condition. Subpopulations of the consortia are inferred from the expression of phenotypes and known metabolic modes of the R. palustris growth. We find that p-coumarate degrading conditions may lead to at least three R. palustris subpopulations utilizing p-coumarate, benzoate, and CO2 and H2. Benzoate degrading conditions may also produce at least three subpopulations utilizing benzoate, CO2 and H2, and N2 and formate. Communication among syntrophs and inter-syntrophic dynamics in each consortium are indicated by up-regulation of transporters and genes involved in the curli formation and chemotaxis. The N2-fixing subpopulation in the benzoate degrading consortium has preferential activation of the vanadium nitrogenase over the molybdenum nitrogenase. This subpopulation in the consortium was confirmed in an independent experiment by consumption of dissolved nitrogen gas under the benzoate degrading conditions
Conserved synteny at the protein family level reveals genes underlying Shewanella species’ cold tolerance and predicts their novel phenotypes
© The Authors 2009. This article is distributed under the terms of the
Creative Commons Attribution Noncommercial License. The definitive version was published in Functional & Integrative Genomics 10 (2010): 97-110, doi:10.1007/s10142-009-0142-y.Bacteria of the genus Shewanella can thrive in different environments and demonstrate significant variability in their metabolic and ecophysiological capabilities including cold and salt tolerance. Genomic characteristics underlying this variability across species are largely unknown. In this study, we address the problem by a comparison of the physiological, metabolic, and genomic characteristics of 19 sequenced Shewanella species. We have employed two novel approaches based on association of a phenotypic trait with the number of the trait-specific protein families (Pfam domains) and on the conservation of synteny (order in the genome) of the trait-related genes. Our first approach is top-down and involves experimental evaluation and quantification of the species’ cold tolerance followed by identification of the correlated Pfam domains and genes with a conserved synteny. The second, a bottom-up approach, predicts novel phenotypes of the species by calculating profiles of each Pfam domain among their genomes and following pair-wise correlation of the profiles and their network clustering. Using the first approach, we find a link between cold and salt tolerance of the species and the presence in the genome of a Na+/H+ antiporter gene cluster. Other cold-tolerance-related genes include peptidases, chemotaxis sensory transducer proteins, a cysteine exporter, and helicases. Using the bottom-up approach, we found several novel phenotypes in the newly sequenced Shewanella species, including degradation of aromatic compounds by an aerobic hybrid pathway in Shewanella woodyi, degradation of ethanolamine by Shewanella benthica, and propanediol degradation by Shewanella putrefaciens CN32 and Shewanella sp. W3-18-1.This research was supported by the U.S. Department of Energy (DOE)
Office of Biological and Environmental Research under the Genomics:
GTL Program via the Shewanella Federation consortium
Gene Prediction by Pattern Recognition and Homology Search
Abstract This paper presents an algorithm for combining pattern recognition-based exon prediction and database homology search in gene model construction. The goal is to use homologous genes or partial genes existing in the database as reference models while constructing (multiple) gene models from exon candidates predicted by pattern recognition methods. A unified framework for gene modeling is used for genes ranging from situations with strong homology to no homology in the database. To maximally use the homology information available, the algorithm applies homology on three levels: (1) exon candidate evaluation, (2) gene-segment construction with a reference model, and (3) (complete) gene modeling. liminary testing has been done on the algorithm. Test results show that (a) perfect gene modeling can be expected when the initial exon predictions are reasonably good and a strong homology exists in the database; (b) homology (not necessarily strong) in general helps improve the accuracy of gene modeling; (c) multiple gene modeling becomes feasible when homology exists in the database for the involved genes
Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags
Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gone structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST.) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gone models within each pair of genc boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gone structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons
Genome analysis Gene and translation initiation site prediction in metagenomic sequences
ABSTRACT Motivation: Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. Results: We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements