Search CORE

4,213 research outputs found

REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

Author: Giess Adam
Jonckheere Veronique
Menschaert Gerben
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

Ghent University Academic Bibliography

EVOLUTION AND DYNAMICS OF TRANSCRIPTIONAL REGULATION IN BACTERIA

Author: Li Shan
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2013
Field of study

Although transcription is one of the most important biological functions of cells, our understanding of its regulation is still limited. In this dissertation, we have studied the transcriptional regulation in prokaryotes in three aspects. First, we investigated the extent to which cis-regulatory elements are conserved during the course of evolution using the LexA regulons in cyanobacteria as an example. We found that in most cyanobacterial genomes analyzed, LexA appears to function as the transcriptional regulator of the key SOS response genes. The loss of lexA in some genomes might lead to the degradation of its binding sites. Second, directional RNA-seq techniques have recently become the workhorse for transcriptome profiling in prokaryotes, however, it is a challenging task to accurately assemble highly labile prokaryotic transcriptomes for further analyses. To fill this gap, we have developed a hidden Markov model based transcriptome assembler which outperforms the state-of-the-art assemblers. Using our tool, we characterized alternative operon structures in E. coli K12 under various growth conditions and growth phases, and found that they are more complex and dynamic than previously anticipated. Lastly, we determined anti-sense and non-coding transcription patterns in E. coli K12 under various growth conditions and time points. We found that a large portion of genes have antisense transcription in a condition-dependent manner. Most antisense transcripts are initiated and restricted to the 5?-end of the gene on the sense strand, and their expression levels are correlated with those of the genes on the sense strand, suggesting that these antisense transcripts might play an important role in transcriptional regulation

The University of North Carolina at Greensboro

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

Author: Ahrén Dag
Darzentas Nikos
Goldovsky Leon
Kaipa Pallavi
Karp Peter D.
Kunin Victor
López-Bigas Núria
Moore-Kochlacs Caroline
Ouzounis Christos A.
Tsoka Sophia
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing

CiteSeerX

Lund University Publications

PubMed Central

King's Research Portal

Genome sequence of the Lebeckia ambigua-nodulating 'Burkholderia sprentiae' strain WSM5005T

Author: Ardley Julie
Bruce David
Chen I-Min
De Meyer Sofie
Detter Chris
Goodwin Lynne
Han Cliff
Han James
Howieson John
Huntemann Marcel
Ivanova Natalia
Kyrpides Nikos
Lu Megan
Markowitz Victor
Mavromatis Konstantinos
Melino Vanesssa
Mikhailova Natalia
O'Hara Graham
Ovchinnikova Galina
Pagani Ioanna
Pati Amrita
Peters Lin
Pitluck Sam
Reeve Wayne
Rui Tian
Szeto Ernest
Tapia Roxanne
Terpolilli Jason
Tiwari Ravi
Wei Chia-Lin
Woyke Tanja
Yates Ron
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

"Burkholderia sprentiae" strain WSM5005(T) is an aerobic, motile, Gram-negative, non-sporeforming rod that was isolated in Australia from an effective N-2-fixing root nodule of Lebeckia ambigua collected in Klawer, Western Cape of South Africa, in October 2007. Here we describe the features of "Burkholderia sprentiae" strain WSM5005T, together with the genome sequence and its annotation. The 7,761,063 bp high-quality-draft genome is arranged in 8 scaffolds of 236 contigs, contains 7,147 protein-coding genes and 76 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program

Ghent University Academic Bibliography

Genomic data mining for the computational prediction of small non-coding RNA genes

Author: Tran Thao Thanh Thi
Publication venue: Georgia Institute of Technology
Publication date: 20/01/2009
Field of study

The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.Ph.D.Committee Chair: Dr. G. Tong Zhou; Committee Member: Dr. Arthur Koblasz; Committee Member: Dr. Eberhard Voit; Committee Member: Dr. Xiaoli Ma; Committee Member: Dr. Ying X

Scholarly Materials And Research @ Georgia Tech

A computational genomics pipeline for prokaryotic sequencing projects

Author: Altschul
Andrew B. Conley
Andrey O. Kislyuk
Aziz
Bendtsen
Bentley
Besemer
Boeckmann
Brian H. Harcourt
Chen
Chen
Darling
Delcher
Dhwani Govil
Eid
Fleischmann
Gerlach
Holmes
Hotopp
I. King Jordan
Jay C. Humphrey
Jolley
Kathleen M. Tatti
Kislyuk
Krogh
Kroll
Kuo
Lapierre
Lee S. Katz
Leonard W. Mayer
Lowe
MacCallum
Maiden
Margulies
Maria L. Tondella
Markowitz
Matthew S. Hagen
Meyers
Miller
Mulder
Parkhill
Perrin
Pop
Pushkala Jayaraman
Quinlan
Raydel D. Mair
Rissman
Rosenstein
Schoen
Scott A. Sammons
Seshadri
Shendure
Sommer
Sonia Agrawal
Stewart
Tettelin
Uniprot Consortium
Viswateja Nelakuditi
Yang
Zerbino
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data

Crossref

PubMed Central

Genome sequence of the Ornithopus/Lupinus-nodulating Bradyrhizobium sp. strain WSM471

Author: Ardley Julie
Bruce David
Chen I-Min
De Meyer Sofie
Detter Chris
Goodwin Lynne
Han Cliff
Han James
Howieson John
Huntemann Marcel
Ivanova Natalia
Kyrpides Nikos C
Lu Megan
Markowitz Victor
Mavromatis Konstantinos
Melino Vanessa
Ninawi Mohamed
O'Hara Graham
Pagani Ioanna
Pati Amrita
Reeve Wayne Gerald
Tapia Roxanne
Terpolilli Jason
Tian Rui
Tiwari Ravi
Wei Chia-Lin
Woyke Tanja
Yates Ronald
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Bradyrhizobium sp. strain WSM471 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-(N-2) fixing root nodule formed on the annual legume Ornithopus pinnatus (Miller) Druce growing at Oyster Harbour, Albany district, Western Australia in 1982. This strain is in commercial production as an inoculant for Lupinus and Ornithopus. Here we describe the features of Bradyrhizobium sp. strain WSM471, together with genome sequence information and annotation. The 7,784,016 bp high-quality-draft genome is arranged in 1 scaffold of 2 contigs, contains 7,372 protein-coding genes and 58 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program

Ghent University Academic Bibliography

Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

Author: A Ben-Bassat
A de Groot
A Eshghi
A Frank
A Keller
A Sittka
AB Robinson
AL Delcher
AW Francis
B Ma
B Polevoda
BK Erickson
C Ansong
C Ansong
C Flinta
C Giglione
CC Chao
CG Miller
Charles Ansong
CM Alpuche Aranda
D Chelius
DA Siegele
DH Haft
DM Horn
DN Perkins
EL Sonnhammer
F Frottin
F Meyer
FN Chang
Fred Heffron
GD Findlay
GE Crooks
GE Merrihew
H Lu
H Nielsen
H Yoon
Hyunjin Yoon
J Armengaud
J Deiwick
JA Karty
JC Wright
JD Bendtsen
JD Gary
JD Jaffe
JD Jaffe
JD Peterson
Jessica L Martin
JJ L'Italien
JK Eng
JN Adkins
Joshua N Adkins
JR Yates
JV Olsen
JW Tobias
K Rutherford
KH Choo
KJ Auberry
L Shi
L Shi
L Stein
M Liebeke
M Mann
M Paetzel
MA Lauber
Marcus Jones
Matthew E Monroe
Meagan C Burnet
Michael McClelland
MP Washburn
N Figueroa-Bossi
N Gupta
N Gupta
N Jaitly
NE Castellana
Nikola Tolić
P Nielsen
PH Hirel
PM Anderson
PM Anderson
Pratap Venepally
R Aebersold
R Craig
Richard D Smith
RJ Arnold
RL Levine
S Griffiths-Jones
Samuel H Payne
Samuel O Purvine
Scott N Peterson
SF Altschul
SH Payne
Steffen Porwollik
T Jarvik
TM Lowe
V Heurgue-Hamard
W Kim
WE Running
Y Shen
Y Shen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Results We experimentally annotated the bacterial pathogen <it>Salmonella </it>Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in <it>Salmonella </it>and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in <it>Salmonella </it>pathogenesis. We also characterized post-translational features in the <it>Salmonella </it>genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our <it>in vivo </it>proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function. Conclusion This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of <it>Salmonella </it>as a resource for systems analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California