Search CORE

870 research outputs found

Application of amino acid occurrence for discriminating different folding types of globular proteins

Author: AG Murzin
H Zhou
HB Shen
HB Shen
HQD Ding
J Cheng
J Shi
KC Chou
KC Chou
M Michael Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
P Klein
QS Du
R Development Core Team
T Hirokawa
TS Kumarevel
WS Bu
Y Ofran
Y-h Taguchi
YD Cai
ZZ Wang
Publication venue: BioMed Central
Publication date: 01/10/2007
Field of study

Abstract Background Predicting the three-dimensional structure of a protein from its amino acid sequence is a long-standing goal in computational/molecular biology. The discrimination of different structural classes and folding types are intermediate steps in protein structure prediction. Results In this work, we have proposed a method based on linear discriminant analysis (LDA) for discriminating 30 different folding types of globular proteins using amino acid occurrence. Our method was tested with a non-redundant set of 1612 proteins and it discriminated them with the accuracy of 38%, which is comparable to or better than other methods in the literature. A web server has been developed for discriminating the folding type of a query protein from its amino acid sequence and it is available at http://granular.com/PROLDA/. Conclusion Amino acid occurrence has been successfully used to discriminate different folding types of globular proteins. The discrimination accuracy obtained with amino acid occurrence is better than that obtained with amino acid composition and/or amino acid properties. In addition, the method is very fast to obtain the results.</p

Crossref

Directory of Open Access Journals

PubMed Central

Functional discrimination of membrane proteins using machine learning techniques

Author: AG Garrow
B Rost
D Fu
DP Chimento
DP Chimento
EL Borths
G von Heijne
GE Tusnady
IH Witten
J Abramson
M Michael Gromiha
MH Saier Jr
MH Saier Jr
MM Gromiha
MM Gromiha
MM Gromiha
MM Gromiha
NK Natt
PG Bagos
PL Martelli
Q Ren
R Dutzler
S Murakami
SF Altschul
T Hirokawa
T Nogi
Y Huang
YD Cai
YH Taguchi
Yukimitsu Yabuki
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Discriminating membrane proteins based on their functions is an important task in genome annotation. In this work, we have analyzed the characteristic features of amino acid residues in membrane proteins that perform major functions, such as channels/pores, electrochemical potential-driven transporters and primary active transporters. Results We observed that the residues Asp, Asn and Tyr are dominant in channels/pores whereas the composition of hydrophobic residues, Phe, Gly, Ile, Leu and Val is high in electrochemical potential-driven transporters. The composition of all the amino acids in primary active transporters lies in between other two classes of proteins. We have utilized different machine learning algorithms, such as, Bayes rule, Logistic function, Neural network, Support vector machine, Decision tree etc. for discriminating these classes of proteins. We observed that most of the algorithms have discriminated them with similar accuracy. The neural network method discriminated the channels/pores, electrochemical potential-driven transporters and active transporters with the 5-fold cross validation accuracy of 64% in a data set of 1718 membrane proteins. The application of amino acid occurrence improved the overall accuracy to 68%. In addition, we have discriminated transporters from other α-helical and β-barrel membrane proteins with the accuracy of 85% using k-nearest neighbor method. The classification of transporters and all other proteins (globular and membrane) showed the accuracy of 82%. Conclusion The performance of discrimination with amino acid occurrence is better than that with amino acid composition. We suggest that this method could be effectively used to discriminate transporters from all other globular and membrane proteins, and classify them into channels/pores, electrochemical and active transporters.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Knowledge-based energy functions for computational studies of proteins

Author: A. Ben-Naim
A. Godzik
A. Godzik
A. Rossi
A.J. Bordner
A.V. Finkelstein
B. Fain
B. Krishnamoorthy
B. Kuhlman
B. Schölkopf
B.H. Park
B.I. Dahiyat
B.J. McConkey
B.O. Mitchell
C. Anfinsen
C. Carter Jr.
C. Czaplewski
C. Hoppe
C. Hu
C. Micheletti
C. Papadimitriou
C. Zhang
C. Zhang
C. Zhang
C. Zhang
C. Zhang
C.A. Rohl
C.B. Anfinsen
C.M.R Lemer
C.S. Mészáros
D. Gilis
D. Gilis
D. Gilis
D. Tobi
D. Xu
E. Venclovas
E.I. Shakhnovich
E.I. Shakhnovich
F.A. Momany
H. Dobbs
H. Edelsbrunner
H. Gan
H. Li
H. Li
H. Lu
H. Zhou
H.S. Chan
I. Muegge
J. Khatun
J. Liang
J.A. Kocher
J.A. Rank
J.M. Deutsch
J.R. Bienkowska
K. Nishikawa
K. Sale
K.H. Lee
K.K. Koretke
K.K. Koretke
K.T. Simons
L. Adamian
L. Adamian
L. Adamian
L.A. Mirny
L.L. Looger
L.M. Amzel
M. Karplus
M. Levitt
M. Vendruscolo
M. Vendruscolo
M.H. Hao
M.H. Hao
M.J. Sippl
M.J. Sippl
M.J. Sippl
M.P. Eastwood
M.R. Betancourt
M.S. Friedrichs
N. Karmarkar
N.V. Buchete
N.V. Buchete
P. Koehl
P. Koehl
P.D. Thomas
P.D. Thomas
P.G. Wolynes
P.J. Munson
R. Goldstein
R. Guerois
R. Jackups Jr.
R. Janicke
R. Méndez
R. Samudrala
R. Samudrala
R.B. Hill
R.I. Dima
R.J. Vanderbei
R.K. Singh
R.L. Jernigan
R.S. DeWitte
S. Liu
S. Miyazawa
S. Miyazawa
S. Miyazawa
S. Shimizu
S. Shimizu
S. Tanaka
S.J. Wodak
T. Kortemme
T. Kortemme
T. Kortemme
T. Lazaridis
T.L. Chiu
U. Bastolla
U. Bastolla
V. Vapnik
V. Vapnik
V.N. Maiorov
W.P. Russ
X. Li
X. Li
Y. Duan
Y. Park
Y. Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2006
Field of study

This chapter discusses theoretical framework and methods for developing knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for developing both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed.Comment: 57 pages, 6 figures. To be published in a book by Springe

arXiv.org e-Print Archive

Crossref

Deciphering the Preference and Predicting the Viability of Circular Permutations in Proteins

Author: A Bakan
A Chakrabartty
A Guerler
A Guerler
A Jeltsch
A Kuzmanic
A Pintar
AC Wallace
AE Todd
AR Panchenko
AR van Erkel
AS Aranko
B Anand
B Halle
B Lee
BA Cunningham
BE Jones
C Pommie
C Vogel
CC Chang
CH Lu
CH Shih
CJ Crasto
CP Lin
CP Ponting
D Bordo
DA Case
Darren R. Flower
DL Nelson
DM Carrington
EA Ribeiro Jr
ESC Shih
FH Arnold
G Amitai
G Bulaj
G Pollastri
GS Baird
H Iwai
H Zhang
HK Liang
HM Berman
I Bahar
I Remy
J Chen
J Hennecke
J Weiner III
J Zhu
JD Pedelacq
Jenn-Kang Hwang
JM Bujnicki
JM Word
JM Yang
JR Quinlan
K Nishikawa
KH Paszkiewicz
L Chen
L Li
LC Tsai
LG Gebhard
Li-Fen Wang
M Elarabaty
M Iwakura
M Kojima
M Ostermeier
M Paluszewski
M Zavodszky
ML Connolly
MN Nguyen
PC Lyu
Ping-Chiang Lyu
PJ Werbos
R Garrett
R Vandrunen
RJ Moreau
S Akanuma
S Hovmoller
S Kundu
S Topell
S Uliel
S Uliel
SF Betz
SG Peisajovich
SJ Hubbard
ST Hsu
T Haliloglu
T Hesterberg
T Nakamura
T Noguchi
Tian Dai
TU Schwartz
V Anantharaman
V Muralidharan
W Kabsch
W Li
W Zheng
WC Lo
WC Lo
WC Lo
Wei-Cheng Lo
WR Pearson
Y Lindqvist
Y Yu
Y Zhang
Yen-Yi Liu
Z Qian
Publication venue: Public Library of Science
Publication date: 16/02/2012
Field of study

Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Residue contact-count potentials are as effective as residue-residue contact-type potentials for ranking protein decoys

Author: Bolser Dan M
Duarte Jose
Filippis Ioannis
Lappe Michael
Stehr Henning
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background For over 30 years potentials of mean force have been used to evaluate the relative energy of protein structures. The most commonly used potentials define the energy of residue-residue interactions and are derived from the empirical analysis of the known protein structures. However, single-body residue 'environment' potentials, although widely used in protein structure analysis, have not been rigorously compared to these classical two-body residue-residue interaction potentials. Here we do not try to combine the two different types of residue interaction potential, but rather to assess their independent contribution to scoring protein structures. Results A data set of nearly three thousand monomers was used to compare pairwise residue-residue 'contact-type' propensities to single-body residue 'contact-count' propensities. Using a large and standard set of protein decoys we performed an in-depth comparison of these two types of residue interaction propensities. The scores derived from the contact-type and contact-count propensities were assessed using two different performance metrics and were compared using 90 different definitions of residue-residue contact. Our findings show that both types of score perform equally well on the task of discriminating between near-native protein decoys. However, in a statistical sense, the contact-count based scores were found to carry more information than the contact-type based scores. Conclusion Our analysis has shown that the performance of either type of score is very similar on a range of different decoys. This similarity suggests a common underlying biophysical principle for both types of residue interaction propensity. However, several features of the contact-count based propensity suggests that it should be used in preference to the contact-type based propensity. Specifically, it has been shown that contact-counts can be predicted from sequence information alone. In addition, the use of a single-body term allows for efficient alignment strategies using dynamic programming, which is useful for fold recognition, for example. These facts, combined with the relative simplicity of the contact-count propensity, suggests that contact-counts should be studied in more detail in the future.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

From Isotropic to Anisotropic Side Chain Representations: Comparison of Three Models for Residue Contact Estimation

Author: A Bauer
A Gilat
A Godzik
A Godzik
A Krishnan
A Liwo
AR Atilgan
B Krishnamoorthy
C Bode
C Deutsch
C Zhang
CW Carter Jr
D Tobi
D Tobi
DA Debe
DC Richardson
DT Jones
DW Gatchell
E Benedetti
E Ravasz
E Tudos
EA Guggenheim
EA Guggenheim
EA Guggenheim
EA Guggenheim
EA Guggenheim
F Melo
F Rao
G Casari
GE Forsythe
GS Rushbrooke
H Lu
H Zhou
HA Bethe
HH Gan
HM Berman
J Janin
J Lee
J Lee
J Li
J Skolnick
JF Nye
Jing He
JU Bowie
JW Ponder
Jörg Langowski
K Al Nasr
K von Schnakenburg
KT Simons
L Yang
LH Greene
M Cohen
M Hendlich
M Levitt
M Levitt
M Lu
M Topf
M Topf
MJ Sippl
MJ Sippl
MJ Sippl
MM Gromiha
MM Gromiha
MM Gromiha
MN James
N Kannan
NV Buchete
P Barah
P Manavalan
P Manavalan
PJ Munson
PJ Munson
R Chandrasekaran
R Samudrala
R Samudrala
RA Karnesky
RK Singh
S Mayewski
S Miyazawa
S Miyazawa
S Miyazawa
S Miyazawa
S Miyazawa
S Miyazawa
S Selvaraj
S Sun
S Tanaka
SC Lovell
SE DeBolt
SH Bryant
TN Bhat
TS Chang
V Sasisekharan
VN Maiorov
W Lin
W Sun
W Sun
W Wang
W Wang
Weitao Sun
WT Sun
WT Sun
YP Feng
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Old Dominion University

Prediction of functionally important residues in globular proteins from unusual central distances of amino acids

Author: Kochańczyk Marek
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. Results Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at <url>http://www.bioinformatics.org/surpresi</url>. Conclusions Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The electrostatic profile of consecutive Cβ atoms applied to protein structure quality assessment.

Author: Asgeirsson Bjarni
Chakraborty Sandeep
Dandekar Abhaya M
Rao Basuthkar J
Venkatramani Ravindra
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

The structure of a protein provides insight into its physiological interactions with other components of the cellular soup. Methods that predict putative structures from sequences typically yield multiple, closely-ranked possibilities. A critical component in the process is the model quality assessing program (MQAP), which selects the best candidate from this pool of structures. Here, we present a novel MQAP based on the physical properties of sidechain atoms. We propose a method for assessing the quality of protein structures based on the electrostatic potential difference (EPD) of Cβ atoms in consecutive residues. We demonstrate that the EPDs of Cβ atoms on consecutive residues provide unique signatures of the amino acid types. The EPD of Cβ atoms are learnt from a set of 1000 non-homologous protein structures with a resolution cuto of 1.6 Å obtained from the PISCES database. Based on the Boltzmann hypothesis that lower energy conformations are proportionately sampled more, and on Annsen's thermodynamic hypothesis that the native structure of a protein is the minimum free energy state, we hypothesize that the deviation of observed EPD values from the mean values obtained in the learning phase is minimized in the native structure. We achieved an average specificity of 0.91, 0.94 and 0.93 on hg_structal, 4state_reduced and ig_structal decoy sets, respectively, taken from the Decoys `R' Us database. The source code and manual is made available at https://github.com/sanchak/mqap and permanently available on 10.5281/zenodo.7134

PubMed Central

eScholarship - University of California

Stabilization of intermediate density states in globular proteins by homogeneous intramolecular attractive interactions

Author: Bahar I.
Jernigan R.L.
Publication venue: The Biophysical Society. Published by Elsevier Inc.
Publication date: 28/02/1994
Field of study

On-lattice simulations of two-dimensional self-avoiding chains subject to homogeneous intramolecular attractive interactions were performed as a model for studying various density regimes in globular proteins. For short chains of less than 15 units, all conformations were generated and classified by density. The range of intramolecular interactions was found to increase uniformly with density, and the average number of topological contacts is directly proportional to density. The uniform interaction energy increases the probability of high density states but does not necessarily lead to dominance of the highest density state. Typically, several large peaks appear in the probability distribution of packing densities, their location and amplitude being determined by the balance between entropic effects enhancing more expanded conformations and attractive interactions favoring compact forms. Also, the homogeneous interaction energy affects the distribution of most probable interacting points in favor of the longer range interactions over the short range ones, but in addition it introduces some more detailed preferences even among short range interactions. There are some implications about the characteristics of the intermediate density states and also for the likelihood that the native state does not correspond completely to the lowest energy conformation

Elsevier - Publisher Connector