Search CORE

66,148 research outputs found

Recommended from our members

PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations

Author: Cui Xiang
Li Liqi
Luo Zhong
Yang Hua
Yu Sanjiu
Zhang Yuan
Zheng Xiaoqi
Zhou Yue
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/03/2014
Field of study

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

Author: A Anand
A Andreeva
A Elofsson
A Krogh
A Paiardini
A Reinhardt
AG Murzin
AY Istomin
B Niu
B Rost
B Rost
C Chen
C Chen
C Orengo
C Zheng
CA Floudas
D Aha
D Jones
D Jones
D Przybylski
EP Carpenter
F Gu
G John
G von Heijne
GP Zhou
H Bigelow
H He
H Kim
H Liu
H Zhang
HM Berman
I Majumdar
I Witten
IB Kuznetsov
J Ruan
J Song
JM Bujnicki
JY Yang
K Bryson
K Chen
K Ginalski
K Kedarisetti
K Kedarisetti
K Tomii
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KY Feng
L Carlacci
L Dong
L Homaeian
L Jin
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LT Huang
Lukasz Kurgan
M Punta
M Punta
M Robnik-Sikonja
MA Hall
Marcin J Mizianty
MM Gromiha
MM Gromiha
MM Gromiha
O Gotoh
OV Galzitskaya
P Baldi
P Langley
P Raman
QS Du
R Apweiler
R Gupta
R Kohavi
RL Dunbrack
RL Marsden
S Brenner
S Cessie
S Costantini
S Costantini
S Jahandideh
S Jahandideh
S Keerthi
S Lee
S Wu
SF Altschul
SR Amirova
T Liu
TF Smith
TL Zhang
TL Zhang
W Chen
X Xiao
X Xiao
X Xiao
X Zheng
Y Cai
Y Cai
Y Cai
Y Cai
Y Cao
Y Zhang
YD Cai
YK Yu
YS Ding
YS Ding
Z Xiang
Z Zhang
ZC Li
ZC Li
ZX Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

Author: Daberdaku Sebastian
Ferrari Carlo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

PUEPro : A Computational Pipeline for Prediction of Urine Excretory Proteins

Author: Chen Xin
Du Wei
Liang Yanchun
Pang Wei
Wang Yan
Xu Ying
Zhang Chi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This work is supported by the National Natural Science Foundation of China (Grant Nos. 81320108025, 61402194, 61572227), Development Project of Jilin Province of China (20140101180JC) and China Postdoctoral Science Foundation (2014T70291).Postprin

Aberdeen University Research

Heriot Watt Pure

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

Author: Al-Shahib A
Breitling R
Gilbert D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

University of Groningen

University of Birmingham Research Portal

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

Kernel-based machine learning protocol for predicting DNA-binding proteins

Author: Bhardwaj Nitin
Langlois Robert E.
Lu Hui
Zhao Guijun
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones

CiteSeerX

Crossref

PubMed Central