Search CORE

571,257 research outputs found

Multi-scale hybrid transformer networks: application to prostate disease classification

Author: Aboagye E
Glocker B
Pinto K
Rockall A
Santhirasekaram A
Winkler M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2021
Field of study

Automated disease classification could significantly improve the accuracy of prostate cancer diagnosis on MRI, which is a difficult task even for trained experts. Convolutional neural networks (CNNs) have shown some promising results for disease classification on multi-parametric MRI. However, CNNs struggle to extract robust global features about the anatomy which may provide important contextual information for further improving classification accuracy. Here, we propose a novel multi-scale hybrid CNN/transformer architecture with the ability of better contextualising local features at different scales. In our application, we found this to significantly improve performance compared to using CNNs. Classification accuracy is even further improved with a stacked ensemble yielding promising results for binary classification of prostate lesions into clinically significant or non-significant

Spiral - Imperial College Digital Repository

Machine learning integration for predicting the effect of single amino acid substitutions on protein stability

Author: Alpaydın Ethem
Gönen Mehmet
Haliloğlu Türkan
Özen Ayşegül
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. Results We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. Conclusion We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at <url>http://www.prc.boun.edu.tr/appserv/prc/mlsta</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Replacing human interpretation of agricultural land in Afghanistan with a deep convolutional neural network

Author: Hamer A. M.
Simms D. M.
Waine T. W.
Publication venue: 'Informa UK Limited'
Publication date: 18/01/2021
Field of study

Afghanistan’s annual opium survey relies upon time-consuming human interpretation of satellite images to map the area of potential poppy cultivation for statistical sample design. Deep Convolutional Neural Networks (CNNs) have shown ground-breaking performance for image classification tasks by encoding local contextual information, in some cases outperforming trained analysts. In this study, we investigate the development of a CNN to automate the classification of agriculture from medium-resolution satellite imagery as an alternative to manual interpretation. The residual network (ResNet50) CNN architecture was trained and validated for delineating the agricultural area using labelled multi-seasonal Disaster Monitoring Constellation (DMC) satellite imagery (32 m) of Helmand and Kandahar provinces. The effect of input image chip size, training sampling strategy, elevation data, and multi-seasonal imagery were investigated. The best-performing single-year classification used an input chip size of 33 × 33 pixels, a targeted sampling strategy and transfer learning, resulting in high overall accuracy (94%). The inclusion of elevation data marginally lowered performance (93%). Multi-seasonal classification achieved an overall accuracy of 89% using the previous two years’ data. Only 25% of the target year’s training samples were necessary to update the model to achieve >94% overall accuracy. A data-driven approach to automate agricultural mask production using CNNs is proposed to reduce the burden of human interpretation. The ability to continually update CNN models with new data has the potential to significantly improve automatic classification of vegetation across year

Cranfield CERES

Predicting residue-wise contact orders in proteins by support vector regression

Author: A Bairoch
AG Murzin
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
B Rost
CH Tsai
D Kihara
D Sarda
DT Jones
G Pollastri
G Pollastri
GP Raghava
HM Berman
J Song
J Wang
Jiangning Song
JM Chandonia
Kevin Burrage
KW Plaxco
M Punta
MPS Brown
NP Prabhu
S Ahmad
S Hua
S Hua
V Vapnik
V Vapnik
W Kabsch
W Liu
X Wang
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. RESULTS: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. CONCLUSION: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

University of Queensland eSpace