95 research outputs found

    ROC curves of our two-step algorithm and other three existing feature selection methods.

    No full text
    <p>ROC curves of our two-step algorithm and other three existing feature selection methods.</p

    Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties

    No full text
    <div><p>Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.</p></div

    Comparison of the AUC value of the the three methods using 5-fold cross-validation on the benchmark dataset.

    No full text
    <p>Comparison of the AUC value of the the three methods using 5-fold cross-validation on the benchmark dataset.</p

    Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the independent test dataset.

    No full text
    <p>Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the independent test dataset.</p

    Performance of selected attributes with the two-step feature selection method.

    No full text
    <p>The first column lists different cutoffs of stability selection scores.</p

    Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the benchmark dataset.

    No full text
    <p>Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the benchmark dataset.</p

    The ROC curves of seven classifiers on the benchmark dataset.

    No full text
    <p>The ROC curves of seven classifiers on the benchmark dataset.</p

    Rankings of feature importance for the optimal selected features.

    No full text
    <p>SN, EN and VN represent sequence neighborhood, Euclidean neighborhood and Voronoi neighborhood, respectively. The numbers in the brackets denote the positions in the sliding window for sequence neighborhood features.</p

    Prediction examples of the functional effects of SAVs in two proteins by PredSAV and other methods.

    No full text
    <p>Red color denotes disease-associated variants while blue color represents neutral variants. (A) and (B) represent proteins PAH (PDB ID: 1J8U, chain A) and LSS (PDB ID: 1W6K, chain A), respectively. 3-D structures are rendered using PyMol [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0179314#pone.0179314.ref075" target="_blank">75</a>].</p

    The framework of PredSAV.

    No full text
    <p>(A) Feature representation. A total of 1521 sequence, Euclidean and Voronoi neighborhood features are initially generated. (B)Two-step feature selection. Stability selection is used as the first step. We select the top 152 features with score larger than 0.2. The second step is performed using a wrapper-based feature selection. Features are evaluated by 5-fold cross-validation with the GTB algorithm. (C) Prediction model. Gradient boosted trees are finally built for prediction.</p
    • …
    corecore