106 research outputs found

    Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups.</p> <p>Results</p> <p>The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java.</p> <p>Conclusions</p> <p>We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.</p

    Genomic encyclopedia of sugar utilization pathways in the Shewanella genus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection of known carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantial challenge due to frequent variations in components of these pathways. To address a practically and fundamentally important challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from its genomic sequence, we combined a subsystems-based comparative genomic approach with experimental validation of selected bioinformatic predictions by a combination of biochemical, genetic and physiological experiments.</p> <p>Results</p> <p>We applied this integrated approach to systematically map carbohydrate utilization pathways in 19 genomes from the <it>Shewanella </it>genus. The obtained genomic encyclopedia of sugar utilization includes ~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinct pathways with a mosaic distribution across <it>Shewanella </it>species providing insights into their ecophysiology and adaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observed phenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix of tested strains and sugars.</p> <p>Comparison of the reconstructed catabolic pathways with <it>E. coli </it>identified multiple differences that are manifested at various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replacements and alternative biochemical routes to a different organization of transcription regulatory networks.</p> <p>Conclusions</p> <p>The reconstructed sugar catabolome in <it>Shewanella </it>spp includes 62 novel isofunctional families of enzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functional organization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our current version of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of this approach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate for the efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data.</p

    The RAST Server: Rapid Annotations using Subsystems Technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.</p> <p>Description</p> <p>We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.</p> <p>The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.</p> <p>Conclusion</p> <p>By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.</p

    Computing and applying atomic regulons to understand gene expression and regulation

    Get PDF
    The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb.2016.01819/full#supplementary-materialUnderstanding gene function and regulation is essential for the interpretation prediction and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets Atomic Regulons ARs represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here we describe an approach for inferring ARs that leverages large-scale expression data sets gene context and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness CLR analysis finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms we computed ARs for Shewanella oneidensis Pseudomonas aeruginosa Thermus thermophilus and Staphylococcus aureus each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.JF acknowledges funding from [SFRH/BD/70824/2010] of the FCT (Portuguese Foundation for Science and Technology) PhD program. CH and PW were supported by the National Science Foundation under grant number EFRI-MIKS-1137089. RT was supported by the Genomic Science Program (GSP), Office of Biological and Environmental Research (OBER), U.S. Department of Energy(DOE),and his work is a contribution of the Pacific North west National Laboratory (PNNL) Foundational Scientific Focus Area. This work was partially supported by an award from the National Science Foundation to MD, AB, NT, and RO (NSFABI-0850546). This work was also supported by the United States National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Service [Contract No. HHSN272201400027C]

    The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

    Get PDF
    The release of the 1000(th) complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms
    • …
    corecore