Predictive model for breast cancer sub-stage classification

Abstract

Prediction accuracy of sub-stage classification varies with different staging methods of breast-cancer. In this Master’s thesis we investigate whether there are differences in the performance of machine learning models across different stages in two different staging systems with three different sets of RNA_Seq data for predicting the sub-stage of breast-cancer. We applied Support Vector Machine method to classify the sub-stage of Surveillance, epidemiology and End Results and Tumor, Node and Metastasis staging system. We carried out tests to see whether the model performance differs in both the staging systems. We also used three different data sets with prior feature selection and investigated the performance of the model in both staging systems with different combinations of sub-stages. To make the performance result more accurate we used cross-validation with performance metric accuracy, sensitivity and specificity. In order to ensure the classifiers` ability of prediction we used three performance metric precisions, recall and F1 score. Finally, we compared the results between the two staging systems and investigated whether protein coding or non-coding RNA gives a better performance. We applied the principle component analysis to reduce the dimensionality and investigate the performance. Our results show that micro-RNAs give a better classification in both staging systems

    Similar works