26 research outputs found
DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach
<div><p>The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS) obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.</p><p>Availability</p><p>The DisoMCS is available at <a href="http://cal.tongji.edu.cn/disorder/" target="_blank">http://cal.tongji.edu.cn/disorder/</a>.</p></div
Adjusting the decision threshold of predicted disorder.
<p>The Sw achieves maximum 76.55% when the decision threshold is 0.03.</p
Additional file 1: of Accurate prediction of protein relative solvent accessibility using a balanced model
S1. The PDB IDs of DB8296. S2. The PDB IDs of DB101. (DOCX 101 kb
The top line represents a protein sequence (from PDB, ID:1CMV:A).
<p>The second line is the real definitions of ordered regions (green), disorder regions (red) and near-disorder regions (yellow, K = 5). The third line is the prediction result of our approach.</p
The flowchart of the DisoMCS.
<p>The DisoMCS used two kinds of features: MCSs and predicted secondary structures, giving a total of twelve features, and a conditional random field as the classification algorithm.</p
Adjusting the decision threshold of predicted near-disorder regions.
<p>The Sw achieves maximum 70.33%when the decision threshold is 0.4.</p
Diagram of profile.
<p>We used 3 kinds of profiles in this study: PSSM, SPSSM and Shape string profile. They have 20, 3 and 8 specific elements for each amino acid respectively obtained by sequence alignment and sequence-structure alignment. Each square represents a element that is normalized frequency. The red squares represent large values near ‘1′ and blue ones represent small values near ‘0′; and the deeper the color of the square is, the closer the value to extreme values.</p
5-fold cross validation results of Train_0925.
<p>5-fold cross validation results of Train_0925.</p