Abstract Background New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. Results We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using <it>Shigella flexneri </it>2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in <it>S. flexneri </it>2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. Conclusions Our findings demonstrate that current <it>Shigella </it>genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in <it>Shigella </it>to perform functional studies.</p

A de Groot

A Fasano

A Kumar

A Palleja

AJ Link

AL Delcher

C Ansong

C Wei

Candong Wei

D Fermin

D Xia

DE Kalume

E Alix

E Lerat

F Yang

GA de Souza

GA Reeves

GD Findlay

H Li

J Lamontagne

JA Vizcaino

JD Jaffe

K Al-Hasani

K Baerenfaller

KL Kotloff

L Delaye

Liguo Liu

Lina Zhao

M Aivaliotis

M Behrens

M Ibrahim

MW Silby

N Gupta

P Nielsen

Q Jin

Qi Jin

RA VanBogelen

RG Sawers

S Gallien

S Renuse

SC Rison

SH Payne

W Kim

Wenchuan Leng

Y Ishino

ZI Johnson

English

PubMed

Crossref

A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF

Springer - Publisher Connector

Abstract Background New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. Results We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using Shigella flexneri 2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in S. flexneri 2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. Conclusions Our findings demonstrate that current Shigella genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in Shigella to perform functional studies.</p

Wei Candong

Leng Wenchuan

Liu Liguo

Zhao Lina

Jin Qi

Directory of Open Access Journals

BMC Genomics

A proteogenomic analysis of <it>Shigella flexneri </it>using 2D LC-MALDI TOF/TOF

A proteogenomic update to Yersinia: enhancing genome annotation.

A: Genome annotation of Anopheles gambiae using mass spectrometryderived data. BMC Genomics

A: Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics

A: The origin of a novel gene through overprinting in Escherichia coli.

al: Whole proteome analysis of posttranslational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res

An overview of nested genes in eukaryotic genomes. Eukaryot Cell

Blanc-Potard AB: Hydrophobic peptides: novel regulators within bacterial membrane. Mol Microbiol

Chisholm SW: Properties of overlapping genes are conserved across microbial genomes. Genome Res

Comprehensive proteomic analysis of Shigella flexneri 2a membrane proteins.

et al: Alliance of proteomics and genomics to unravel the specificities of Sahara bacterium Deinococcus deserti. PLoS Genet

et al: Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery. Nucleic Acids Res

et al: Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res

et al: The complete genome and proteome of Mycoplasma mobile. Genome Res

et al: The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation. Genome Biol

FC: Diagnosis of cellular states of microbial organisms using proteomics. Electrophoresis

Gardan R: A genome-wide survey of short coding sequences in streptococci. Microbiology

Genome and proteome annotation: organization, interpretation and integration.

Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science

GM: Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis

GM: Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics

HG: Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry. Proteomics

L: A guide to the Proteomics Identifications Database proteomics data repository. Proteomics 2009, 9(18):4276-4283. doi:10.1186/1471-2164-12-528 Cite this article as: Zhao et al.: A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF.

Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics

Mass spectrometry-based prokaryote gene annotation. Proteomics

MM: Global burden of Shigella infections: implications for vaccine development and implementation of control strategies. Bull World Health Organ

MM: Shigella enterotoxin 1: an enterotoxin of Shigella flexneri 2a active in rabbit small intestine in vivo and in vitro.

Nataro JP: Regulation of the overlapping pic/set locus in Shigella flexneri and enteroaggregative Escherichia coli. Infect Immun

NG: Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis.

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol

Ochman H: Recognizing the pseudogenes in bacterial genomes.

Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol. Genome Res

Paramithiotis E: Proteomics-based confirmation of protein expression and correction of annotation errors in the Brucella abortus genome.

RD: Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomic Proteomic

Sakellaris H: Genetic organization of the she pathogenicity island in Shigella flexneri 2a. Microb Pathog

Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res

SB: Overlapping protein-encoding genes in Pseudomonas fluorescens Pf0-1. PLoS Genet

SB: Proteomic detection of non-annotated proteincoding genes in Pseudomonas fluorescens Pf0-1. PLoS One

Subproteomic tools to increase genome annotation complexity. Proteomics

Thanassi DG: The outer membrane usher forms a twin-pore secretion complex.

Transcript analysis of Escherichia coli K-12 insertion element IS5. FEMS Microbiol Lett

WJ: Proteomic discovery of previously unannotated, rapidly evolving seminal fluid genes in Drosophila. Genome Res

file:///data/core-remote/dit/data/Springer-OA/pdf/981/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTQ3MS0yMTY0LTEyLTUyOC5wZGY=.pdf

A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF

Abstract

Similar works

Full text

Available Versions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Springer - Publisher Connector