Search CORE

22 research outputs found

Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

Author: A. Drawid
A. Tanay
Anirvan M. Sengupta
D.J. Schwab
David J. Schwab
E. Schneidman
G. Stormo
H. Jeffreys
J.B. Kinney
L.E. Baum
M. Djordjevic
M. Weigt
N. Halabi
O.G. Berg
P. Mahalanobis
Pankaj Mehta
R. Olsen
S. Sinha
T. Mora
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2010
Field of study

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

arXiv.org e-Print Archive

Crossref

Susceptibility calculations for alternating antiferromagnetic chains

Author: Bonner J. C.
Bonner J. C.
Bulaevskii L. N.
Bulaevskii L. N.
Crawford V. H.
Drawid M.
Etemad S.
H. W. J. Blöte
I. S. Jacobs
J. C. Bonner
J. W. Bray
Khanna S. K.
Luther A.
Pytte E.
Todani T.
van Ooijen J. A. C.
Publication venue: DigitalCommons@URI
Publication date: 01/01/1979
Field of study

Earlier work of Duffy and Barr consisting of exact calculations on alternating antiferromagnetic Heisenberg spin‐1/2 chains is extended to longer chains of up to 12 spins, and subsequent extrapolations of thermodynamic properties, particularly the susceptibility, are extended to the weak alternation region close to the uniform limit. This is the region of interest in connection with the recent experimental discovery of spin‐Peierls systems. The extrapolated susceptibility curves are compared with corresponding curves calculated from the model of Bulaevskii, which has been used extensively in approximate theoretical treatments of a variety of phenomena. Qualitative agreement is observed in the uniform limit and persists for all degrees of alternation, but quantitative differences of about 10% are present over the whole range, including the isolated dimer limit. Potential application of the new susceptibility calculations to experiment is discussed

Crossref

DigitalCommons@URI

An FPT Approach for Predicting Protein Localization from Yeast Genomic Data

Author: A Bairoch
A Drawid
A Drawid
A Krogh
A Kumar
A Pierleoni
A Reinhardt
A Shiratori
Chunhe Li
D Boyd
D Frishman
D Frishman
DM Engelman
Erkang Wang
FCP Holstege
G Giaever
G von Heijne
H Nielsen
H Nielsen
HW Mewes
I Nachman
I Paulsen
J Han
J Han
J Wang
Jin Wang
JL DeRisi
JW Han
K Nakai
K Nakai
K Nakai
K Nakai
L Stryer
LJ Lu
LV Zhang
M Andrade
M Brown
M Gerstein
M Riffle
N Friedman
N Friedman
N Friedman
N Lin
P Klein
P Spellman
P Verhasselt
PE Hodges
R Agarwal
R Agrawal
R Jansen
R Tatusov
Shin-Han Shiu
T Ito
T Stephenson
V Alexandrov
WK Huh
WL Huang
Xidi Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research

Public Library of Science (PLOS)

Crossref

PubMed Central

Changchun Institute of Applied Chemistry, Chinese Academy Of Sciences

A method to improve protein subcellular localization prediction by integrating various biological data sources

Author: A Bairoch
A Drawid
A Reinhardt
C Kuo-Chen
CS Yu
Doheon Lee
E Camon
H Nielsen
H Wen-Lin
Huang Ying
I Lee
J Cedano
J Guo
K Lee
K Nakai
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KJ Park
M Reczko
O Emanuelsson
O Emanuelsson
P Horton
P Horton
P Horton
S Hagit
S Hua
S Michelle
Thai Quang Tung
WK Huh
YD Cai
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

Author: A Drawid
A Lagreid
A Tanay
AC Gavin
AJ Enright
B Schwikowski
CJ Roberts
EM Marcotte
EM Marcotte
GD Bader
HJ Bussemaker
HW Mewes
I Cherel
J Ihmels
Jianghui Xiong
Kunyi Luo
LF Wu
M Ashburner
M Deng
M Deng
M Pellegrini
MB Eisen
MC von
MP Brown
OG Troyanskaya
P Jorgensen
P Uetz
PT Spellman
R Kohavi
R Overbeek
SF Altschul
Shanguang Chen
Simon Rayner
T Ito
TR Hazbun
TR Hughes
U Karaoz
WK Huh
WR Pearson
X Zhou
Y Chen
Y Ho
Yinghui Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrative Identification of Arabidopsis Mitochondrial Proteome and Its Function Exploitation through Protein Interaction Network

Author: A Drawid
A Drawid
A Höglund
A Kumar
A Reinhardt
A Vianello
AA Vashisht
AH Millar
AH Millar
BJ Haas
BW Rhee SY
C Bai
C Guda
C Jonak
CG Bartoli
CG Kurland
CM Lee
D Skowyra
DA Bota
David Moore
E Delannoy
E Jambrina
E Mazzucotelli
EE Patton
EE Patton
EH Kruft V
EM Marcotte
EO Karlberg
F Rébeillé
GW Tian
H Bannai
H Fölsch
H Prokisch
HN Chua
I Small
ID Small
IM Moller
J Balk
J Bardel
J Cui
J Huang
J Kilian
JA Kreps
Jian Cui
Jinghua Liu
JK Zhu
JL Heazlewood
JL Heazlewood
JL Heazlewood
K Ishizaki
K Meierhoff
KP O'Brien
L Li
LJ Lu
M Teige
M Unseld
MG Claros
MG Claros
O Emanuelsson
O Van Aken
OA Koroleva
P Horton
P Pavlidis
R Nair
R Nair
RA Irizarry
S Hua
S Killcoyne
S Li
S Ma
S Maere
S Mahajan
S Mili
SG Andersson
T Sing
Tieliu Shi
V Gueguen
VK Mootha
Vladimir Uversky
W Werhahn
WK Huh
X Gong
Y Gavel
YD Cai
YD Cai
Yuhua Li
Z Liu
Z Yuan
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

One-Dimensional Model Systems: Theoretical Survey

Author: Baker G. A.
Baker G. A.
Bari R. A.
Baxter R. J.
Blume M.
Blöte H. W. J.
Bonner J. C.
Bonner J. C.
Bonner J. C.
Brower R. C.
Capel H. W.
Chui S. T.
de Jongh L. J.
deNeef T.
Drawid M.
Gupta N.
Hohenberg P. C.
Hone D. W.
Imry Y.
Jill C. Bonner
Johnson J. D.
Kosterlitz J. M.
Lai C. K.
Landau D. P.
Luther A.
Nagle J. F.
Nagle J. F.
Reynolds P. J.
Richards P. M.
Rogiers J.
Shiba H.
Shiba H.
Stanley H. E.
Sur A.
Sur A.
Sutherland B.
Suzuki M.
Takahashi M.
Takahashi M.
Takahashi M.
Takahashl M.
Todani T.
Tonegawa T.
Publication venue: DigitalCommons@URI
Publication date: 01/03/1978
Field of study

In the early 1960’s one-dimensional model systems were regarded as amusing toys with the advantage of being far more easily solvable than their ’’real’’ three-dimensional counterparts. Now essentially 1-D (quasi-1-D) magnets can be ’’tailor-made’’ in the laboratory. Even more popular is the field of organic conductors like TTF⋅TCNQ, which are naturally quasi-1-D. Currently solitons and related solutions of non-linear, dispersive 1-D differential equations are ubiquitous in physics, including the area of 1-D magnetism. These developments are discussed in the Introduction. The rest of this paper is concerned with model Hamiltonians, model comparisons, critical singularities in 1-D (quasi-1-D) systems, accuracy of numerical techniques in comparison with exact solutions, brief accounts of dilute and disordered 1-D systems, and 1-D spin dynamics. Finally, a comment is made on a variety of interesting isomorphisms between 1-D magnets and phenomena in several other areas of physics, for example 2-D ferroelectrics, field-theoretic models, and realistic fluids. Comparison of theory and experiment has been the subject of several excellent reviews and is therefore not discussed here

Crossref

DigitalCommons@URI

Integrative Analysis of the Mitochondrial Proteome in Yeast

Author: Achleitner
Altschul
Andreoli
Christian Kozany
Christophe Andreoli
Curt Scharfe
David G Camp
DeRisi
DeSouza
DiMauro
Dimmer
Drawid
Eng
Erin O'Shea
Ferguson
Foury
Gavin
Ghaemmaghami
Glick
Hans Zischka
Heather M Mottaz
Ho
Holger Prokisch
Huh
Ito
Kim K Hixson
Kumar
Lars M Steinmetz
Lascaris
Lior David
Lipton
Marc
Marina A Gritsenko
Marius Ueffing
Matthew E Monroe
Mewes
Nakai
Ohlmeier
Patterson
Peter J Oefner
Pflieger
Richard D Smith
Ronald J Moore
Ronald W Davis
Scharfe
Shen
Sickmann
Smith
Steinmetz
Thomas Meitinger
Uetz
von Mering
Wallace
Washburn
Washburn
Wenzhong Xiao
Westermann
Wu
Zelek S Herman
Zischka
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

In this study yeast mitochondria were used as a model system to apply, evaluate, and integrate different genomic approaches to define the proteins of an organelle. Liquid chromatography mass spectrometry applied to purified mitochondria identified 546 proteins. By expression analysis and comparison to other proteome studies, we demonstrate that the proteomic approach identifies primarily highly abundant proteins. By expanding our evaluation to other types of genomic approaches, including systematic deletion phenotype screening, expression profiling, subcellular localization studies, protein interaction analyses, and computational predictions, we show that an integration of approaches moves beyond the limitations of any single approach. We report the success of each approach by benchmarking it against a reference set of known mitochondrial proteins, and predict approximately 700 proteins associated with the mitochondrial organelle from the integration of 22 datasets. We show that a combination of complementary approaches like deletion phenotype screening and mass spectrometry can identify over 75% of the known mitochondrial proteome. These findings have implications for choosing optimal genome-wide approaches for the study of other cellular systems, including organelles and pathways in various species. Furthermore, our systematic identification of genes involved in mitochondrial function and biogenesis in yeast expands the candidate genes available for mapping Mendelian and complex mitochondrial disorders in humans

Public Library of Science (PLOS)