Search CORE

25 research outputs found

OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif

Author: Drawid Amar
Gupta Nupur
Gélinas Céline
Nagaraj Vijayalakshmi H
Sengupta Anirvan M
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results. Results We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-κB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-κB binding sites predicted by our method are likely to be functional. Conclusion Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-κB function and regulation and possible new biological roles of NF-κB were uncovered.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An FPT Approach for Predicting Protein Localization from Yeast Genomic Data

Author: A Bairoch
A Drawid
A Drawid
A Krogh
A Kumar
A Pierleoni
A Reinhardt
A Shiratori
Chunhe Li
D Boyd
D Frishman
D Frishman
DM Engelman
Erkang Wang
FCP Holstege
G Giaever
G von Heijne
H Nielsen
H Nielsen
HW Mewes
I Nachman
I Paulsen
J Han
J Han
J Wang
Jin Wang
JL DeRisi
JW Han
K Nakai
K Nakai
K Nakai
K Nakai
L Stryer
LJ Lu
LV Zhang
M Andrade
M Brown
M Gerstein
M Riffle
N Friedman
N Friedman
N Friedman
N Lin
P Klein
P Spellman
P Verhasselt
PE Hodges
R Agarwal
R Agrawal
R Jansen
R Tatusov
Shin-Han Shiu
T Ito
T Stephenson
V Alexandrov
WK Huh
WL Huang
Xidi Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research

Public Library of Science (PLOS)

Crossref

PubMed Central

Changchun Institute of Applied Chemistry, Chinese Academy Of Sciences

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

Author: A Drawid
A Lagreid
A Tanay
AC Gavin
AJ Enright
B Schwikowski
CJ Roberts
EM Marcotte
EM Marcotte
GD Bader
HJ Bussemaker
HW Mewes
I Cherel
J Ihmels
Jianghui Xiong
Kunyi Luo
LF Wu
M Ashburner
M Deng
M Deng
M Pellegrini
MB Eisen
MC von
MP Brown
OG Troyanskaya
P Jorgensen
P Uetz
PT Spellman
R Kohavi
R Overbeek
SF Altschul
Shanguang Chen
Simon Rayner
T Ito
TR Hazbun
TR Hughes
U Karaoz
WK Huh
WR Pearson
X Zhou
Y Chen
Y Ho
Yinghui Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

One-Dimensional Model Systems: Theoretical Survey

Author: Baker G. A.
Baker G. A.
Bari R. A.
Baxter R. J.
Blume M.
Blöte H. W. J.
Bonner J. C.
Bonner J. C.
Bonner J. C.
Brower R. C.
Capel H. W.
Chui S. T.
de Jongh L. J.
deNeef T.
Drawid M.
Gupta N.
Hohenberg P. C.
Hone D. W.
Imry Y.
Jill C. Bonner
Johnson J. D.
Kosterlitz J. M.
Lai C. K.
Landau D. P.
Luther A.
Nagle J. F.
Nagle J. F.
Reynolds P. J.
Richards P. M.
Rogiers J.
Shiba H.
Shiba H.
Stanley H. E.
Sur A.
Sur A.
Sutherland B.
Suzuki M.
Takahashi M.
Takahashi M.
Takahashi M.
Takahashl M.
Todani T.
Tonegawa T.
Publication venue: DigitalCommons@URI
Publication date: 01/03/1978
Field of study

In the early 1960’s one-dimensional model systems were regarded as amusing toys with the advantage of being far more easily solvable than their ’’real’’ three-dimensional counterparts. Now essentially 1-D (quasi-1-D) magnets can be ’’tailor-made’’ in the laboratory. Even more popular is the field of organic conductors like TTF⋅TCNQ, which are naturally quasi-1-D. Currently solitons and related solutions of non-linear, dispersive 1-D differential equations are ubiquitous in physics, including the area of 1-D magnetism. These developments are discussed in the Introduction. The rest of this paper is concerned with model Hamiltonians, model comparisons, critical singularities in 1-D (quasi-1-D) systems, accuracy of numerical techniques in comparison with exact solutions, brief accounts of dilute and disordered 1-D systems, and 1-D spin dynamics. Finally, a comment is made on a variety of interesting isomorphisms between 1-D magnets and phenomena in several other areas of physics, for example 2-D ferroelectrics, field-theoretic models, and realistic fluids. Comparison of theory and experiment has been the subject of several excellent reviews and is therefore not discussed here

Crossref

DigitalCommons@URI

A method to improve protein subcellular localization prediction by integrating various biological data sources

Author: A Bairoch
A Drawid
A Reinhardt
C Kuo-Chen
CS Yu
Doheon Lee
E Camon
H Nielsen
H Wen-Lin
Huang Ying
I Lee
J Cedano
J Guo
K Lee
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
S Hagit
S Hua
S Michelle
Thai Quang Tung
WK Huh
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrative Analysis of the Mitochondrial Proteome in Yeast

Author: Achleitner
Altschul
Andreoli
Christian Kozany
Christophe Andreoli
Curt Scharfe
David G Camp
DeRisi
DeSouza
DiMauro
Dimmer
Drawid
Eng
Erin O'Shea
Ferguson
Foury
Gavin
Ghaemmaghami
Glick
Hans Zischka
Heather M Mottaz
Ho
Holger Prokisch
Huh
Ito
Kim K Hixson
Kumar
Lars M Steinmetz
Lascaris
Lior David
Lipton
Marc
Marina A Gritsenko
Marius Ueffing
Matthew E Monroe
Mewes
Nakai
Ohlmeier
Patterson
Peter J Oefner
Pflieger
Richard D Smith
Ronald J Moore
Ronald W Davis
Scharfe
Shen
Sickmann
Smith
Steinmetz
Thomas Meitinger
Uetz
von Mering
Wallace
Washburn
Washburn
Wenzhong Xiao
Westermann
Wu
Zelek S Herman
Zischka
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

In this study yeast mitochondria were used as a model system to apply, evaluate, and integrate different genomic approaches to define the proteins of an organelle. Liquid chromatography mass spectrometry applied to purified mitochondria identified 546 proteins. By expression analysis and comparison to other proteome studies, we demonstrate that the proteomic approach identifies primarily highly abundant proteins. By expanding our evaluation to other types of genomic approaches, including systematic deletion phenotype screening, expression profiling, subcellular localization studies, protein interaction analyses, and computational predictions, we show that an integration of approaches moves beyond the limitations of any single approach. We report the success of each approach by benchmarking it against a reference set of known mitochondrial proteins, and predict approximately 700 proteins associated with the mitochondrial organelle from the integration of 22 datasets. We show that a combination of complementary approaches like deletion phenotype screening and mass spectrometry can identify over 75% of the known mitochondrial proteome. These findings have implications for choosing optimal genome-wide approaches for the study of other cellular systems, including organelles and pathways in various species. Furthermore, our systematic identification of genes involved in mitochondrial function and biogenesis in yeast expands the candidate genes available for mapping Mendelian and complex mitochondrial disorders in humans

Public Library of Science (PLOS)

Integrative Identification of Arabidopsis Mitochondrial Proteome and Its Function Exploitation through Protein Interaction Network

Author: A Drawid
A Drawid
A Höglund
A Kumar
A Reinhardt
A Vianello
AA Vashisht
AH Millar
AH Millar
BJ Haas
BW Rhee SY
C Bai
C Guda
C Jonak
CG Bartoli
CG Kurland
CM Lee
D Skowyra
DA Bota
David Moore
E Delannoy
E Jambrina
E Mazzucotelli
EE Patton
EE Patton
EH Kruft V
EM Marcotte
EO Karlberg
F Rébeillé
GW Tian
H Bannai
H Fölsch
H Prokisch
HN Chua
I Small
ID Small
IM Moller
J Balk
J Bardel
J Cui
J Huang
J Kilian
JA Kreps
Jian Cui
Jinghua Liu
JK Zhu
JL Heazlewood
JL Heazlewood
JL Heazlewood
K Ishizaki
K Meierhoff
KP O'Brien
L Li
LJ Lu
M Teige
M Unseld
MG Claros
MG Claros
O Emanuelsson
O Van Aken
OA Koroleva
P Horton
P Pavlidis
R Nair
R Nair
RA Irizarry
S Hua
S Killcoyne
S Li
S Ma
S Maere
S Mahajan
S Mili
SG Andersson
T Sing
Tieliu Shi
V Gueguen
VK Mootha
Vladimir Uversky
W Werhahn
WK Huh
X Gong
Y Gavel
YD Cai
YD Cai
Yuhua Li
Z Liu
Z Yuan
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central