571,257 research outputs found

    Multi-scale hybrid transformer networks: application to prostate disease classification

    Get PDF
    Automated disease classification could significantly improve the accuracy of prostate cancer diagnosis on MRI, which is a difficult task even for trained experts. Convolutional neural networks (CNNs) have shown some promising results for disease classification on multi-parametric MRI. However, CNNs struggle to extract robust global features about the anatomy which may provide important contextual information for further improving classification accuracy. Here, we propose a novel multi-scale hybrid CNN/transformer architecture with the ability of better contextualising local features at different scales. In our application, we found this to significantly improve performance compared to using CNNs. Classification accuracy is even further improved with a stacked ensemble yielding promising results for binary classification of prostate lesions into clinically significant or non-significant

    Machine learning integration for predicting the effect of single amino acid substitutions on protein stability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high.</p> <p>Results</p> <p>We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration.</p> <p>Conclusion</p> <p>We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at <url>http://www.prc.boun.edu.tr/appserv/prc/mlsta</url>.</p

    Replacing human interpretation of agricultural land in Afghanistan with a deep convolutional neural network

    Get PDF
    Afghanistan’s annual opium survey relies upon time-consuming human interpretation of satellite images to map the area of potential poppy cultivation for statistical sample design. Deep Convolutional Neural Networks (CNNs) have shown ground-breaking performance for image classification tasks by encoding local contextual information, in some cases outperforming trained analysts. In this study, we investigate the development of a CNN to automate the classification of agriculture from medium-resolution satellite imagery as an alternative to manual interpretation. The residual network (ResNet50) CNN architecture was trained and validated for delineating the agricultural area using labelled multi-seasonal Disaster Monitoring Constellation (DMC) satellite imagery (32 m) of Helmand and Kandahar provinces. The effect of input image chip size, training sampling strategy, elevation data, and multi-seasonal imagery were investigated. The best-performing single-year classification used an input chip size of 33 × 33 pixels, a targeted sampling strategy and transfer learning, resulting in high overall accuracy (94%). The inclusion of elevation data marginally lowered performance (93%). Multi-seasonal classification achieved an overall accuracy of 89% using the previous two years’ data. Only 25% of the target year’s training samples were necessary to update the model to achieve >94% overall accuracy. A data-driven approach to automate agricultural mask production using CNNs is proposed to reduce the burden of human interpretation. The ability to continually update CNN models with new data has the potential to significantly improve automatic classification of vegetation across year

    Predicting residue-wise contact orders in proteins by support vector regression

    Get PDF
    BACKGROUND: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. RESULTS: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. CONCLUSION: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences
    • …
    corecore