Search CORE

121,613 research outputs found

Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors

Author: Adamian
Anfinsen
Avbelj
Bahar
Bastolla
Betancourt
Brooks
Buchete
Cannata
Chiu
Cline
Crasto
Dill
Dill
Dima
Dobson
Fain
Fitzkee
Fletcher
Friedrichs
Gan
Goldstein
Guntert
Hao
Head-Gordon
Hou
Hu
Hunter
Joachims
Kolinski
Kolodny
Kuang
Lazaridis
Levinthal
Levitt
Lezon
Li
Li
Liang
Loose
Lu
Maiorov
McConkey
McGuffin
Mirny
Miyazawa
Murphy
Park
Park
Pearlman
Pei
Przytycka
Riddle
Sagot
Samudrala
Samudrala
Schölkopf
Shortle
Shortle
Simons
Simons
Simons
Thomas
Tobi
Tobi
Tsai
Vendruscolo
Vendruscolo
Vriend
Wang
Wang
Xia
Xia
Zhang
Zhang
Zhang
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only

C_\alpha

or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue specific reduced discrete state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or

C_\alpha

atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side chain centers or coordinates of all side chain atoms. By reducing the residue alphabets down to size 5 for local structure-sequence relationship, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.Comment: 20 pages, 5 figures, 4 tables. In press, Protein

arXiv.org e-Print Archive

Crossref

Bayesian models and algorithms for protein beta-sheet prediction

Author: Altunbasak Yucel
Altunbaşak Yücel
Aydın Zafer
Aydin Zafer
Erdogan Hakan
Erdoğan Hakan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2011
Field of study

Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy

Sabanci University Research Database

Bayesian models and algorithms for protein beta-sheet prediction

Author: Altunbasak Yucel
Altunbaşak Yücel
Aydın Zafer
Aydin Zafer
Erdogan Hakan
Erdoğan Hakan
Publication venue
Publication date: 01/01/2009
Field of study

Sabanci University Research Database

Empirical study of deep neural network architectures for protein secondary structure prediction

Author: Du Ming
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Protein secondary structure prediction is a sub-problem of protein structure prediction. Instead of fully recovering the whole three dimensional structure from amino acid sequence, protein secondary structure prediction only aimed at predicting the local structures such as alpha helices, beta strands and turns for each small segment of a protein. Predicted protein secondary structure can be used for improving fold recognition, ab initial protein prediction, protein motifs prediction and sequence alignment. Protein secondary structure prediction has been extensively studied with machine learning approaches. And in recent years, multiple deep neural network methods have pushed the state-of-art performance of 8-categories accuracy to around 69 percent. Deep neural networks are good at capturing the global information in the whole protein, which are widely believed to be crucial for the prediction. And due to the development of high level neural network libraries, implementing and training neural networks are becoming more and more convenient and efficient. This project focuses on empirical performance comparison of various deep neural network architectures and the effects of hyper-parameters for protein secondary structure prediction. Multiple deep neural network architectures representing the state-of-the-art for secondary structure prediction are implemented using TensorFlow, the leading deep learning platform. In addition, a software environment for performing efficient empirical studies are implemented, which includes network input and parameter control, and training, validation, and test performance monitoring. An extensive amount of experiments have been conducted using popular datasets and benchmarks and generated some useful results. For example, the experimental results show that recurrent layers are useful in improving prediction accuracy, achieving up to 5 percent improvement on 8-category accuracy. This work also shows the trade off between running speed and building speed of the model, and the trade off between running speed and accuracy. As a result, a relatively small size recurrent network have been build and achieved 69.5 percent 8-category accuracy on dataset CB513

University of Missouri: MOspace

Potential function of simplified protein models for discriminating native proteins from decoys: Combining contact interaction and local sequence-dependent geometry

Author: Chen Rong
Liang Jie
Zhang Jinfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

An effective potential function is critical for protein structure prediction and folding simulation. For simplified models of proteins where coordinates of only

C_\alpha

atoms need to be specified, an accurate potential function is important. Such a simplified model is essential for efficient search of conformational space. In this work, we present a formulation of potential function for simplified representations of protein structures. It is based on the combination of descriptors derived from residue-residue contact and sequence-dependent local geometry. The optimal weight coefficients for contact and local geometry is obtained through optimization by maximizing margins among native and decoy structures. The latter are generated by chain growth and by gapless threading. The performance of the potential function in blind test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. This potential function have comparable or better performance than several residue-based potential functions that require in addition coordinates of side chain centers or coordinates of all side chain atoms.Comment: 4 pages, 2 figures, Accepted by 26th IEEE-EMBS Conference, San Francisc

arXiv.org e-Print Archive

Crossref

Ligand Binding Site Detection b Local Structure Alignment and Its Performance Complementarity

Author: Im Wonpil
Lee Hui Sun
Publication venue: 'American Chemical Society (ACS)'
Publication date: 17/05/2017
Field of study

Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA

KU ScholarWorks

FigShare

Modeling and predicting all-α transmembrane proteins including helix–helix pairing

Author: Steyaert Jean-Marc
Waldispühl Jérôme
Publication venue: Elsevier B.V.
Publication date
Field of study

AbstractModeling and predicting the structure of proteins is one of the most important challenges of computational biology. Exact physical models are too complex to provide feasible prediction tools and other ab initio methods only use local and probabilistic information to fold a given sequence. We show in this paper that all-α transmembrane protein secondary and super-secondary structures can be modeled with a multi-tape S-attributed grammar. An efficient structure prediction algorithm using both local and global constraints is designed and evaluated. Comparison with existing methods shows that the prediction rates as well as the definition level are sensibly increased. Furthermore this approach can be generalized to more complex proteins

Elsevier - Publisher Connector

Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only

Author: A Pintar
A Pintar
A Pintar
A Schlessinger
A Schlessinger
A Schlessinger
A Schlessinger
A Shrake
AG Murzin
AR Kinjo
AR Kinjo
Ashley M. Buckle
B Lee
B Rost
B Rost
B Rost
C Chothia
CK Smith
D Baker
D Varrazzo
D Xie
DT Jones
DT Jones
E Schmitt
EM Marcotte
F Ferre
G Pollastri
Geoffrey I. Webb
GP Raghava
H Chen
H Zhang
H Zhou
Hao Tan
HM Berman
J Cheng
J Cheng
J Qiu
J Song
J Song
J Song
J Song
J Wan
James C. Whisstock
JC Whisstock
Jiangning Song
JJ Ward
JM Chandonia
JU Bowie
K Bajaj
K Chen
K Vlahovicek
Khalid Mahmood
L Kurgan
LA Kurgan
M Connolly
M Kumar
M Lee
M Stout
ME Lacombe-Harvey
MK Kalita
MN Nguyen
O Schueler-Furman
P Radivojac
RG Coleman
Ruby H. P. Law
S Ahmad
S Chakravarty
S Liu
S Miller
Sean David Mooney
SF Altschul
T Hamelryck
T Ishida
T Joachims
T Noguchi
Tatsuya Akutsu
TL Blundell
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
WL DeLano
X Wang
Y Bromberg
Y Kalidas
Y Ofran
Y Ofran
Z Yuan
Z Yuan
ZX Wang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

University of Melbourne Institutional Repository

svmPRAT: SVM-based Protein Residue Annotation Toolkit

Author: A Kernytsky
AG de Brevern
AG Murzin
AK Dunker
AR Kinjo
B Rost
C Etchebest
C Kauffman
Christopher Kauffman
DT Jones
DT Jones
G Karypis
G Pollastri
G Pollastri
GE Crooks
George Karypis
H Rangwala
Huzefa Rangwala
J Cheng
J Cheng
M Gribskov
O Noivirit-Brik
R Ahmed
R Karchin
R Sanchez
RC Whaley
S Ahmad
S Hirose
SF Altschul
T Joachims
T Schwede
V Vapnik
VN Vapnik
W Kabsch
Y Ofran
Z Dosztnyi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. Results We present a general purpose protein residue annotation toolkit (<it>svm</it><monospace>PRAT</monospace>) to allow biologists to formulate residue-wise prediction problems. <it>svm</it><monospace>PRAT</monospace> formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of <it>svm</it><monospace>PRAT</monospace> is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue <it>svm</it><monospace>PRAT</monospace> captures local information around the reside to create fixed length feature vectors. <it>svm</it><monospace>PRAT</monospace> implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training effective predictive models. Conclusions In this work we evaluate <it>svm</it><monospace>PRAT</monospace> on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. <it>svm</it><monospace>PRAT</monospace> has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems. <it>Availability</it>: <url>http://www.cs.gmu.edu/~mlbio/svmprat</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

3D Protein structure prediction with genetic tabu search algorithm

Author: A Irbäck
A Irbäck
CB Anfinsen
D Li
DW Corne
DW Mount
F Glover
F Glover
FH Stillinger
FH Stillinger
HS Lopes
Huiping Luo
J Holland
J Zhu
Jack Y Yang
Jinshan Tang
JT Ngo
KA Dill
M Bachmann
M Chen
Mary Qu Yang
MT Hoque
O Takahashi
Q M Yang
R König
R Unger
R Unger
R Unger
SY Kim
Ting Wang
W Cheng
WE Hart
WE Hart
X Zhang
X Zhang
Xiaolong Zhang
Youping Deng
Z Michalewicz
Publication venue: BioMed Central
Publication date: 28/05/2010
Field of study

Abstract Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively

Crossref

IUPUIScholarWorks

PubMed Central

eScholarship - University of California