53 research outputs found

    A sequence logo illustration generated by WebLog to show the occurrence frequency of amino acid surrounding the protein cleavage sites.

    No full text
    <p>where N and C represent the N- and C-terminus of the 22-residue peptide, respectively, with the cleavage sites occurring at the site 11 and site 12.</p

    The predicted results obtained with different window size.

    No full text
    <p>Sn: sensitivity.</p><p>Sp: specificity.</p><p>Ac: accuracy.</p><p>MCC: Matthews's correlation coefficient.</p

    Bar plots to show the feature distribution for the 65 optimal features and the corresponding site distribution.

    No full text
    <p>See the section of “Analysis of the optimal feature set” for further explanation.</p

    Bar plots to show the distribution in the optimal feature set for the PSSM score and the corresponding specific site score.

    No full text
    <p>See the section of “PSSM conservation score feature analysis” for further explanation.</p

    Plot to show the values of MCC against different number of features used based on the data in Supporting Information S4.

    No full text
    <p>When the 65 features were used, a peak of MCC was obtained. These 65 features were considered as the optimal feature set for our classifier.</p

    Comparison with three proteasome cleavage prediction methods.

    No full text
    <p>Sn: sensitivity.</p><p>Sp: specificity.</p><p>Ac: accuracy.</p><p>MCC: Matthews's correlation coefficient.</p

    Prediction of Protein Cleavage Site with Feature Selection by Random Forest

    No full text
    <div><p>Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.</p> </div

    Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

    No full text
    <div><p>Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.</p> </div

    A comparison between the similarity-based method (Eq.10) and the interaction-based method (Eq.6) in identifying the 2,138 drugs in the dataset (cf. Supporting Information S2).

    No full text
    <p>A comparison between the similarity-based method (Eq.10) and the interaction-based method (Eq.6) in identifying the 2,138 drugs in the dataset (cf. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0035254#pone.0035254.s002" target="_blank">Supporting Information S2</a>).</p
    • …
    corecore