Search CORE

46 research outputs found

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

Author: A Ceroni
A Lesk
A Salamov
A Vullo
Alberto JM Martin
Alessandro Vullo
B Rost
B Rost
B Rost
C Orengo
Catherine Mooney
D Frishman
D Jones
D Jones
D Przybylski
E Krieger
G Gianese
G Pollastri
G Pollastri
G Pollastri
G Pollastri
Gianluca Pollastri
H Berman
H Naderi-Manesh
J Cheng
J Cheng
J Cuff
J Moult
J Moult
J Sim
JA Cuff
L Fourrier
M Mucchielli-Giorgi
M Nguyen
M Wagner
P Baldi
P Baldi
P Baldi
P Bradley
R Adamczak
R Karchin
S Ahmad
S Altschul
S Montgomerie
S Qin
SK Riis
T Petersen
U Hobohm
V Eyrich
W Kabsch
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background : Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. Results: Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. Conclusion: The predictive system are publicly available at the address http://distill.ucd.ieScience Foundation IrelandIrish Research Council for Science, Engineering and TechnologyHealth Research BoardUCD President's Award 2004au, da, ke, ab, sp - kpw30/11/1

Crossref

Research Repository UCD

Springer - Publisher Connector

PubMed Central

Accurate single-sequence prediction of solvent accessible surface area using local and global features

Author: Faraggi Eshel
Kloczkowski Andrzej
Zhou Yaoqi
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org

IUPUIScholarWorks

PubMed Central

DeepREx-WS: A web server for characterising protein–solvent interaction starting from sequence

Author: Casadio R.
Manfredi M.
Martelli P. L.
Savojardo C.
Publication venue
Publication date: 01/01/2021
Field of study

Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A generic method for assignment of reliability scores applied to solvent accessibility predictions

Author: Andersen Pernille
Lundegaard Claus
Nielsen Morten
Petersen Bent
Petersen Thomas Nordahl
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score. Results An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output. Conclusion The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

Online Research Database In Technology

Accurate prediction of protein relative solvent accessibility using a balanced model

Author: B Lee
B Petersen
C Chothia
C Fan
C Mirabello
CN Magnan
DT Jones
E Faraggi
G Pollastri
G Pollastri
G Wang
I Walsh
J Eickholt
J Eickholt
J Hoskins
J Lafferty
J Lafferty
J Sim
J Sun
J Zhang
JY Wang
K Joo
KI Cho
L Wang
P Cong
Peisheng Cong
PW Rose
R Adamczak
S Liu
S Wang
S Wang
S Wang
SF Altschul
Tonghua Li
W Kabsch
W Li
Wei Wu
WR Atchley
Z Tang
Zhiheng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Template-Based C8-Scorpion: A Protein 8 State Secondary Structure Prediction Method Using Structural Information and Context-Based Features

Author: Li Yaohang
Yaseen Ashraf
Publication venue: ODU Digital Commons
Publication date: 01/01/2014
Field of study

Background: Secondary structures prediction of proteins is important to many protein structure modeling applications. Correct prediction of secondary structures can significantly reduce the degrees of freedom in protein tertiary structure modeling and therefore reduces the difficulty of obtaining high resolution 3D models. Methods: In this work, we investigate a template-based approach to enhance 8-state secondary structure prediction accuracy. We construct structural templates from known protein structures with certain sequence similarity. The structural templates are then incorporated as features with sequence and evolutionary information to train two-stage neural networks. In case of structural templates absence, heuristic structural information is incorporated instead. Results: After applying the template-based 8-state secondary structure prediction method, the 7-fold cross-validated Q8 accuracy is 78.85%. Even templates from structures with only 20% ~ 30% sequence similarity can help improve the 8-state prediction accuracy. More importantly, when good templates are available, the prediction accuracy of less frequent secondary structures, such as 3-10 helices, turns, and bends, are highly improved, which are useful for practical applications. Conclusions: Our computational results show that the templates containing structural information are effective features to enhance 8-state secondary structure predictions. Our prediction algorithm is implemented on a web server named C8-SCORPION available at: http://hpcr.cs.odu.edu/c8scorpion

Springer - Publisher Connector

PubMed Central

Old Dominion University

CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs

Author: Alberto J. M. Martin
Albrecht
Alessandro Vullo
Ali
Altschul
Baldi
Berman
Cheng
Diella
Dosztanyi
Dunker
Dunker
Dyson
Fuxreiter
Gianluca Pollastri
Gibson
Gould
Hemsley
Ian Walsh
Jones
Linding
Lise
Lobanov
Lobanov
Marsella
McGuffin
Melamud
Mika
Mizianty
Noivirt-Brik
Obradovic
Pollastri
Pollastri
Russell
Schaefer
Schlessinger
Sickmeier
Siltberg-Liberles
Silvio C. E. Tosatto
Sirota
Sollich
Tompa
Tompa
Tompa
Tomàs Di Domenico
Trovato
Uversky
Vanhee
Vullo
Ward
Weiss
Wright
Xue
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Impact of residue accessible surface area on the prediction of protein secondary structures

Author: Marashi Sayed-Amir
Momen-Roknabadi Amir
Pezeshk Hamid
Sadeghi Mehdi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method. Results We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained. Conclusion The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

Author: Sundararajan Saraswathi
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these sequences. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers. Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Other research areas include Relative Solvent Accessibility (RSA) predictions and predictions of phosphorylation sites, which is one of the Post-Translational Modification (PTM) sites in proteins. Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction (FLOPRED) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.33% and a testing accuracy of 92.24% with a standard deviation of 0.48% obtained for a small group of 84 proteins. We have a Matthew\u27s correlation-coefficient ranging between 80.58% and 84.30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.38%. These results are significantly higher than those found in the literature. Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results. Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helix, beta-strand and coil. A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented. Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization. Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92.53% , a testing accuracy 91.42% and Matthew\u27s correlation coefficient of 83.9%. In summary, secondary structure prediction, Relative Solvent Accessibility and phosphorylation site prediction have been carried out on multiple sets of data, encoded with a variety of information drawn from proteins and the physicochemical properties of their constituent amino acids. Improved and efficient algorithms called S-ELM and FLOPRED, which are based on Neural Networks and Particle Swarm Optimization are used for classifying and predicting protein sequences. Analysis of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structure prediction. S-ELM and FLOPRED have also proven to be robust and efficient for predicting relative solvent accessibility of proteins and phosphorylation sites. These studies show that our method is robust and resilient and can be applied for a variety of purposes. It can be expected to yield higher classification accuracy and better generalization performance compared to previous methods

Digital Repository @ Iowa State University (ISU)

PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation

Author: Bagos
Bendtsen
Canutescu
Carter
Cuff
D. Arndt
D. S. Wishart
Eyrich
Garrow
Hall
J. A. Cruz
Jones
Kernytsky
Krogh
Liu
M. Berjanskii
Mewes
Montgomerie
Nayeem
Pollastri
Riley
Rost
S. Montgomerie
S. Shrivastava
Schwede
Stothard
Stothard
Ullman
Van Domselaar
Wallner
Walther
Wishart
Publication venue: Oxford University Press
Publication date
Field of study

PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 Å RMSD). The average PROTEUS2 prediction takes ∼3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a http://wishart.biology.ualberta.ca/proteus2

Crossref

PubMed Central