Search CORE

70 research outputs found

A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF

Author: A de Groot
A Fasano
A Kumar
A Palleja
AJ Link
AL Delcher
C Ansong
C Wei
C Wei
Candong Wei
D Fermin
D Xia
DE Kalume
E Alix
E Lerat
F Yang
GA de Souza
GA Reeves
GD Findlay
H Li
J Lamontagne
JA Vizcaino
JD Jaffe
JD Jaffe
K Al-Hasani
K Baerenfaller
KL Kotloff
L Delaye
Liguo Liu
Lina Zhao
M Aivaliotis
M Behrens
M Ibrahim
MW Silby
N Gupta
P Nielsen
Q Jin
Qi Jin
RA VanBogelen
RG Sawers
S Gallien
S Renuse
SC Rison
SH Payne
W Kim
Wenchuan Leng
Y Ishino
ZI Johnson
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. Results We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using <it>Shigella flexneri </it>2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in <it>S. flexneri </it>2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. Conclusions Our findings demonstrate that current <it>Shigella </it>genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in <it>Shigella </it>to perform functional studies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Synchronization of cytoplasmic and transferred mitochondrial ribosomal protein gene expression in land plants is linked to Telo-box motif enrichment

Abstract Background Chloroplasts and mitochondria evolved from the endosymbionts of once free-living eubacteria, and they transferred most of their genes to the host nuclear genome during evolution. The mechanisms used by plants to coordinate the expression of such transferred genes, as well as other genes in the host nuclear genome, are still poorly understood. Results In this paper, we use nuclear-encoded chloroplast (cpRPGs), as well as mitochondrial (mtRPGs) and cytoplasmic (euRPGs) ribosomal protein genes to study the coordination of gene expression between organelles and the host. Results show that the mtRPGs, but not the cpRPGs, exhibit strongly synchronized expression with euRPGs in all investigated land plants and that this phenomenon is linked to the presence of a <it>telo</it>-box DNA motif in the promoter regions of mtRPGs and euRPGs. This motif is also enriched in the promoter regions of genes involved in DNA replication. Sequence analysis further indicates that mtRPGs, in contrast to cpRPGs, acquired <it>telo</it>-box from the host nuclear genome. Conclusions Based on our results, we propose a model of plant nuclear genome evolution where coordination of activities in mitochondria and chloroplast and other cellular functions, including cell cycle, might have served as a strong selection pressure for the differential acquisition of <it>telo</it>-box between mtRPGs and cpRPGs. This research also highlights the significance of physiological needs in shaping transcriptional regulatory evolution.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Unpredictability of metabolism—the key role of metabolomics science in combination with next-generation genome sequencing

Next-generation sequencing provides technologies which sequence whole prokaryotic and eukaryotic genomes in days, perform genome-wide association studies, chromatin immunoprecipitation followed by sequencing and RNA sequencing for transcriptome studies. An exponentially growing volume of sequence data can be anticipated, yet functional interpretation does not keep pace with the amount of data produced. In principle, these data contain all the secrets of living systems, the genotype–phenotype relationship. Firstly, it is possible to derive the structure and connectivity of the metabolic network from the genotype of an organism in the form of the stoichiometric matrix N. This is, however, static information. Strategies for genome-scale measurement, modelling and predicting of dynamic metabolic networks need to be applied. Consequently, metabolomics science—the quantitative measurement of metabolism in conjunction with metabolic modelling—is a key discipline for the functional interpretation of whole genomes and especially for testing the numerical predictions of metabolism based on genome-scale metabolic network models. In this context, a systematic equation is derived based on metabolomics covariance data and the genome-scale stoichiometric matrix which describes the genotype–phenotype relationship

Crossref

Springer - Publisher Connector

PubMed Central

Shotgun proteomics of the barley seed proteome

Author: A Gorg
A Koller
A Takahashi
AI Nesvizhskii
AL Capriotti
AL Capriotti
B Beecher
B Zhang
BC Bonsager
C Abdallah
C Finnie
C Finnie
C Han
H Liu
HE Kristoffersen
HF Darlington
IJ Stulemeijer
J Wang
K Baerenfaller
K Oracz
K Witzel
KA Neilson
KJ Morton
L Dure
L Perrocheau
L Rajjou
M Boren
M Shah
MC Romero-Rodriguez
MJ Allison
MJ Giroux
MP Gomes
O Martinez de Ilarduya
O Ostergaard
O Ostergaard
P Greenwell
P Sanchez de la Hoz
P Sourdille
PR Shewry
R Flengsrud
R Hynek
Ramamurthy Mahalingam
S Cai
S Gorjanovic
S Kaspar
S Komatsu
S Laugesen
T Komatsuda
TC Lu
W Weiss
W Weiss
WC Burger
Y Yang
Z Du
Z-L Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

TESTLoc: protein subcellular localization prediction from EST data

Author: A Chacinska
A Kumar
A Pierleoni
A Reinhardt
AG Hatzigeorgiou
BF Lang
C Guda
C Guda
C Iseli
CS Yu
CS Yu
D Sarda
Gertraud Burger
H Bannai
H Shatkay
HM Yuan
HN Lin
HW Platta
I Small
J Assfalg
J Li
J Liu
J Parkinson
JD Wasmuth
K Baerenfaller
KC Chou
KC Chou
KJ Park
L Barbe
LB Koski
M Boden
MG Claros
MS Boguski
MS Scott
O Emanuelsson
P Rice
R Casadio
R Kaundal
R Lascaris
R Nair
R Nair
R Nair
RE Fan
S Briesemeister
S Hua
SF Altschul
T Blum
TM Devlin
W Li
WK Huh
Y Huang
Y Lee
Yao-Qing Shen
YQ Shen
YQ Shen
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies

[Image: see text] Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives

Queen's University Belfast Research Portal

Crossref

PubMed Central

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

Targeted reprogramming of H3K27me3 resets epigenetic memory in plant paternal chromatin

Author: A Dobin
A Horstman
A Houben
A Inoue
B Glöckle
B Langmead
B Sun
CE Niederhuth
D Jiang
D Reinberg
D Twell
D Wang
DW Galbraith
F Borges
F Hofmann
F Laprell
F Lu
F Lu
F Ramírez
F Zenk
GL Min
H Li
H Wollmann
H Yang
H Zheng
I Khanday
I Makarevitch
I Martínez-Fernández
J Brind’Amour
J Chen
J Kang
J Moreno-Romero
J Reimand
JP Calarco
JT Robinson
K Baerenfaller
K Maehara
K Nozue
K Zhang
KR Kaneshiro
L Brownfield
LE Moritz
LJ Zhu
M Bayer
M Borg
M Borg
M Borg
M De Lucas
M Gehring
M Ingouff
M Sachs
M Vermeulen
M Xu
MA Schon
MF Belmonte
MI Love
N Reverón-Gómez
NL Bray
P Crevillén
P Voigt
P Zhao
P Zhao
PJ Murphy
R Narsai
RE Braun
RH Dowen
RT Coleman
S Boscá
S Picelli
S Zheng
S-F Wu
SA Johnson-Brousseau
SS Hammoud
T Daley
T Kawashima
T Okada
T Slotte
TM Tabuchi
U Brykczynska
U Grossniklaus
VK Schoft
W Reik
W She
W Yan
WE Friedman
Y Hamamura
Y Ikeda
Y Jacob
Y Jacob
Y Sano
Y Zhang
Y Zhou
Z Gu
Z Tao
Z-P Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

Epigenetic marks are reprogrammed in the gametes to reset genomic potential in the next generation. In mammals, paternal chromatin is extensively reprogrammed through the global erasure of DNA methylation and the exchange of histones with protamines(1,2). Precisely how the paternal epigenome is reprogrammed in flowering plants has remained unclear since DNA is not demethylated and histones are retained in sperm(3,4). Here, we describe a multi-layered mechanism by which H3K27me3 is globally lost from histone-based sperm chromatin in Arabidopsis. This mechanism involves the silencing of H3K27me3 writers, activity of H3K27me3 erasers and deposition of a sperm-specific histone, H3.10 (ref. (5)), which we show is immune to lysine 27 methylation. The loss of H3K27me3 facilitates the transcription of genes essential for spermatogenesis and pre-configures sperm with a chromatin state that forecasts gene expression in the next generation. Thus, plants have evolved a specific mechanism to simultaneously differentiate male gametes and reprogram the paternal epigenome

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Edinburgh Research Explorer

Investigating the validity of current network analysis on static conglomerate networks by protein network stratification

Author: A Bossi
A Ma'ayan
AC Gavin
AH Tong
AK Ramani
AL Barabasi
AL Hopkins
B Zybailov
C Alfarano
C von Mering
D Eisenberg
D Greenbaum
D Swarbreck
G Balazsi
G Joshi-Tope
G Palla
H Huang
H Jeong
H Ma
H Rutschow
H Yu
H Yu
H Yu
H Zhang
HW Ma
HY Chuang
IW Taylor
J Cui
JD Han
JF Rual
K Baerenfaller
K Yang
KY Yip
LH Hartwell
Long J Lu
M Arita
M Ashburner
M Girvan
M Hamacher
M Miyamoto
M Zhang
ME Newman
Minlu Zhang
MJ Herrgard
MP Samanta
N Bertin
N Guelzim
N Lemke
NM Luscombe
NN Batada
NN Batada
P Braun
P Qiu
PV Missiuro
R Guimera
R Kelley
R Milo
R Sharan
RJ Prill
S Li
S Peri
S Wuchty
SA Teichmann
SE Calvano
T Ideker
T Kislinger
TZ Berardini
U de Lichtenberg
U Stelzl
WH Lin
Y Xia
YR Cho
Z Wang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A molecular network perspective forms the foundation of systems biology. A common practice in analyzing protein-protein interaction (PPI) networks is to perform network analysis on a conglomerate network that is an assembly of all available binary interactions in a given organism from diverse data sources. Recent studies on network dynamics suggested that this approach might have ignored the dynamic nature of context-dependent molecular systems. Results In this study, we employed a network stratification strategy to investigate the validity of the current network analysis on conglomerate PPI networks. Using the genome-scale tissue- and condition-specific proteomics data in <it>Arabidopsis thaliana</it>, we present here the first systematic investigation into this question. We stratified a conglomerate <it>A. thaliana </it>PPI network into three levels of context-dependent subnetworks. We then focused on three types of most commonly conducted network analyses, i.e., topological, functional and modular analyses, and compared the results from these network analyses on the conglomerate network and five stratified context-dependent subnetworks corresponding to specific tissues. Conclusions We found that the results based on the conglomerate PPI network are often significantly different from those of context-dependent subnetworks corresponding to specific tissues or conditions. This conclusion depends neither on relatively arbitrary cutoffs (such as those defining network hubs or bottlenecks), nor on specific network clustering algorithms for module extraction, nor on the possible high false positive rates of binary interactions in PPI networks. We also found that our conclusions are likely to be valid in human PPI networks. Furthermore, network stratification may help resolve many controversies in current research of systems biology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central