Search CORE

The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes

Author: Agindotan
Aittamaa
Angiuoli
Arnaud
B. N. Adhikari
Brown
C. A. Levesque
C. R. Buell
Chen
Chen
Choi
Cuomo
da Silva
Darling
Dean
Duan
E. C. Neeno-Eckwall
Fessehaie
Finn
Gajendran
Gish
Glasner
Goecks
Guldener
Haas
Hedeler
J. E. Leach
J. P. Hamilton
Kamoun
Kamper
Keon
Kubota
Leamon
Li
Margulies
Metzker
Mulder
Mungall
Munroe
N. T. Perna
N. Tisserat
Neumann
Pertea
Rozen
Salanoubat
Shendure
Simpson
Soanes
Stein
Tettelin
Tripathy
Tyler
Wang
Wise
Wootton
Yin
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera

Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

Author: AK Roos
C Rodriguez-Penagos
David Pot
E Boutet
G Wu
Guy Plunkett
HM Muller
JD Glasner
JD Glasner
JD Glasner
Jeremy D Glasner
JM Greene
Joel Fedorko
John M Greene
Jon Whitmore
M Demerec
M Krallinger
M Riley
Matthew Shaker
Mila Ramos-Santacruz
Nicole T Perna
P Stothard
Panna Shetty
R Hoffman
RD Fleischmann
RK Aziz
S Gama-Castro
S Kim
Sam Zaremba
Thomas Hampton
Y-C Fang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The Enteropathogen Resource Integration Center (ERIC; <url>http://www.ericbrc.org</url>) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as <it>Escherichia coli </it>and <it>Salmonella </it>spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. Description We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. Conclusion Our Text Mining application is available online on the ERIC website <url>http://www.ericbrc.org/portal/eric/articles</url>. The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.</p

Directory of Open Access Journals

Digital Repository @ Iowa State University (ISU)

Construction and characterization of an expressed sequenced tag library for the mosquito vector Armigeres subalbatus

Author: Aliota Matthew T
Bartholomay Lyric C
Chen Cheng-Chen
Cho Wen-Long
Christensen Bruce M
Fuchs Jeremy F
Hsiao Kwang-Jen
Huang Chiung-Yen
Kou Hang-Yen
Liu Tze-Tze
Mayhew George F
Perna Nicole T
Rocheleau Thomas A
Tsai Shih-Feng
Tsao I-Yu
Yang Ueng-Cheng
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The mosquito, <it>Armigeres subalbatus</it>, mounts a distinctively robust innate immune response when infected with the nematode <it>Brugia malayi</it>, a causative agent of lymphatic filariasis. In order to mine the transcriptome for new insight into the cascade of events that takes place in response to infection in this mosquito, 6 cDNA libraries were generated from tissues of adult female mosquitoes subjected to immune-response activation treatments that lead to well-characterized responses, and from aging, naïve mosquitoes. Expressed sequence tags (ESTs) from each library were produced, annotated, and subjected to comparative analyses. Results Six libraries were constructed and used to generate 44,940 expressed sequence tags, of which 38,079 passed quality filters to be included in the annotation project and subsequent analyses. All of these sequences were collapsed into clusters resulting in 8,020 unique sequence clusters or singletons. EST clusters were annotated and curated manually within ASAP (A Systematic Annotation Package for Community Analysis of Genomes) web portal according to BLAST results from comparisons to Genbank, and the <it>Anopheles gambiae </it>and <it>Drosophila melanogaster </it>genome projects. Conclusion The resulting dataset is the first of its kind for this mosquito vector and provides a basis for future studies of mosquito vectors regarding the cascade of events that occurs in response to infection, and thereby providing insight into vector competence and innate immunity.</p

Directory of Open Access Journals

National Health Research Institues

Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of Escherichia coli O157:H7 and Yersinia pestis

Author: Banta Lois M.
Baumler David J.
Cabot Eric L.
Glasner Jeremy D.
Hung Kai F.
Perna Nicole T.
Schwarz Jodi A.
Publication venue: The Keep
Publication date: 01/04/2012
Field of study

Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples related to bacterial pathogenesis. Students first examine alignments of genomes of Escherichia coli O157:H7 strains isolated from three food-poisoning outbreaks using the multiple-genome alignment tool Mauve. Students investigate conservation of virulence factors using the Mauve viewer and by browsing annotations available at the A Systematic Annotation Package for Community Analysis of Genomes database. In the second module, students use an alignment of five Yersinia pestis genomes to analyze single-nucleotide polymorphisms of three genes to classify strains into biovar groups. Students are then given sequences of bacterial DNA amplified from the teeth of corpses from the first and second pandemics of the bubonic plague and asked to classify these new samples. Learning-assessment results reveal student improvement in self-efficacy and content knowledge, as well as students’ ability to use BLAST to identify genomic islands and conduct analyses of virulence factors from E. coli O157:H7 or Y. pestis. Each of these educational modules offers educators new ready-to-implement resources for integrating comparative genomic topics into their curricula

Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation

Author: Brinkman Fiona S. L.
Cheng Dean
Ching Wai-Kay Ho
Hancock Robert E. W.
Huang Shaoshan
Lo Raymond
Sui Shannan J. Ho
Ung Korine S.E.
Winsor Geoffrey L.
Publication venue: Oxford University Press
Publication date: 01/01/2004
Field of study

Using the Pseudomonas aeruginosa Genome Project as a test case, we have developed a database and submission system to facilitate a community-based approach to continually updated genome annotation (http://www.pseudomonas.com). Researchers submit proposed annotation updates through one of three web-based form options which are then subjected to review, and if accepted, entered into both the database and log file of updates with author acknowledgement. In addition, a coordinator continually reviews literature for suitable updates, as we have found such reviews to be the most efficient. Both the annotations database and updates-log database have Boolean search capability with the ability to sort results and download all data or search results as tab-delimited files. To complement this peer-reviewed genome annotation, we also provide a linked GBrowse view which displays alternate annotations. Additional tools and analyses are also integrated, including PseudoCyc, and knockout mutant information. We propose that this database system, with its focus on facilitating flexible queries of the data and providing access to both peer-reviewed annotations as well as alternate annotation information, may be a suitable model for other genome projects wishing to use a continually updated, community-based annotation approach. The source code is freely available under GNU General Public Licence

Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops

Author: Bourgait I
Chiapello H
El Karoui M
Gendrault-Jacquemard A
Heuclin G
Petit M-A
Sourivong F
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Public databases now contain multitude of complete bacterial genomes, including several genomes of the same species. The available data offers new opportunities to address questions about bacterial genome evolution, a task that requires reliable fine comparison data of closely related genomes. Recent analyses have shown, using pairwise whole genome alignments, that it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops. RESULTS: Here, we generalize this approach and propose a strategy that allows systematic and non-biased genome segmentation based on multiple genome alignments. Segmentation analyses, as applied to 13 different bacterial species, confirmed the feasibility of our approach to discern the 'mosaic' organization of bacterial genomes. Segmentation results are available through a Web interface permitting functional analysis, extraction and visualization of the backbone/loops structure of documented genomes. To illustrate the potential of this approach, we performed a precise analysis of the mosaic organization of three E. coli strains and functional characterization of the loops. CONCLUSION: The segmentation results including the backbone/loops structure of 13 bacterial species genomes are new and available for use by the scientific community at the URL:

Directory of Open Access Journals

Edinburgh Research Explorer

HAL Descartes

ProdInra

Hal-Diderot

Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome

Author: dos Reis M.
Savva Renos
Wernisch Lorenz
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/12/2003
Field of study

Escherichia coli has long been regarded as a model organism in the study of codon usage bias (CUB). However, most studies in this organism regarding this topic have been computational or, when experimental, restricted to small datasets; particularly poor attention has been given to genes with low CUB. In this work, correspondence analysis on codon usage is used to classify E.coli genes into three groups, and the relationship between them and expression levels from microarray experiments is studied. These groups are: group 1, highly biased genes; group 2, moderately biased genes; and group 3, AT-rich genes with low CUB. It is shown that, surprisingly, there is a negative correlation between codon bias and expression levels for group 3 genes, i.e. genes with extremely low codon adaptation index (CAI) values are highly expressed, while group 2 show the lowest average expression levels and group 1 show the usual expected positive correlation between CAI and expression. This trend is maintained over all functional gene groups, seeming to contradict the E.coli–yeast paradigm on CUB. It is argued that these findings are still compatible with the mutation–selection balance hypothesis of codon usage and that E.coli genes form a dynamic system shaped by these factors

Birkbeck Institutional Research Online

AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

Author: Bessières P.
Bossy R.
Bryson K.
Chaillou S.
Gibrat J.-F.
Hoebeke M.
Loux V.
Maguin E.
Nicolas P.
Penaud S.
van de Guchte M.
Publication venue
Publication date: 01/07/2006
Field of study

We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

UCL Discovery

yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes

Author: Brendel Volker
Schlueter Shannon D
Wilkerson Matthew D
Publication venue: BioMed Central
Publication date: 19/07/2006
Field of study

Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at