6 research outputs found
Machine learning prediction of breast cancer survival using age, sex, length of stay, mode of diagnosis and location of cancer
Breast cancer is one of the leading causes of death in females and survival
depends on early diagnosis and treatment. This paper applied machine
learning techniques in prediction of breast cancer survival (dead or alive) using
age, sex, length of stay, mode of diagnosis and location of cancer as
predictors (independent variables). The data was obtained from the outpatient
department of the University of Ilorin Teaching Hospital, Ilorin, Nigeria. The
sample size of 300 consists of 175 females and 25 males who were admitted at
the hospital and treated for breast cancer. The patients were later discharged
or died. Adaptive boosting (AdaBoost) performed best out of the data mining
models used in the classification in all the three cases where the target class is
average over classes, alive or dead. The AdaBoost performed best with the
classification accuracy and area under curve (AUC) of 98.3% and 99.9%
respectively. Furthermore, a probe on the prediction by AdaBoost showed that the probability of dead due to breast cancer is 0.47, which the length of stay
hugely contributed to the high probability, location of breast cancer and
mode of diagnosis contributed minimally while age and sex contributed
insignificantly. The high probability of breast cancer mortality predicted in this
paper is a call for concern as early detection of breast cancer, routine breast
examination and breast cancer awareness are crucial in increasing the
probability of survival. The results can be used to design a decision support
system that can increase the chances of breast cancer survival
Machine learning to improve interpretability of clinical, radiological and panel-based genomic data of glioma grade 4 patients undergoing surgical resection
Background: Glioma grade 4 (GG4) tumors, including astrocytoma IDH-mutant grade 4 and the astrocytoma IDH wt are the most common and aggressive primary tumors of the central nervous system. Surgery followed by Stupp protocol still remains the first-line treatment in GG4 tumors. Although Stupp combination can prolong survival, prognosis of treated adult patients with GG4 still remains unfavorable. The introduction of innovative multi-parametric prognostic models may allow refinement of prognosis of these patients. Here, Machine Learning (ML) was applied to investigate the contribution in predicting overall survival (OS) of different available data (e.g. clinical data, radiological data, or panel-based sequencing data such as presence of somatic mutations and amplification) in a mono-institutional GG4 cohort. Methods: By next-generation sequencing, using a panel of 523 genes, we performed analysis of copy number variations and of types and distribution of nonsynonymous mutations in 102 cases including 39 carmustine wafer (CW) treated cases. We also calculated tumor mutational burden (TMB). ML was applied using eXtreme Gradient Boosting for survival (XGBoost-Surv) to integrate clinical and radiological information with genomic data. Results: By ML modeling (concordance (c)- index = 0.682 for the best model), the role of predicting OS of radiological parameters including extent of resection, preoperative volume and residual volume was confirmed. An association between CW application and longer OS was also showed. Regarding gene mutations, a role in predicting OS was defined for mutations of BRAF and of other genes involved in the PI3K-AKT-mTOR signaling pathway. Moreover, an association between high TMB and shorter OS was suggested. Consistently, when a cutoff of 1.7 mutations/megabase was applied, cases with higher TMB showed significantly shorter OS than cases with lower TMB. Conclusions: The contribution of tumor volumetric data, somatic gene mutations and TBM in predicting OS of GG4 patients was defined by ML modeling
Can machine learning methods contribute as a decision support system in sequential oligometastatic radioablation therapy?
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsCancer treatment is among the major medical challenges of this century. Sequential oligometastatic radio-ablation (SOMA) is a novel treatment method that aims at ablating reoccurring metastasis in a single session with a targeted high dose of radiation. To know if SOMA is the best possible treatment method for a patient, the benefits of each available therapy need to be understood and evaluated.
The ability to model complex systems, such as cancer treatment, is the strength of machine learning techniques. These techniques have improved the understanding of numerous medical therapies already. In some cases, they can serve as medical support systems if they deliver reliable results that doctors can trust and understand.
The results obtained from applying numerous machine learning techniques to the data of SOMA-treated patients show that there are favorable techniques in some cases. It was observed that the Random Forest algorithm proved superior at different classification tasks. Additionally, regression problems opposed a great challenge, as the amount of data is very limited. Finally, SHAP values - a novel machine learning interpretation technique – provided valuable insights into understanding the rationale of each algorithm. They proved that the machine learning algorithms could learn patterns aligned with the human intuition in the problems presented.
SHAP values show great potential in bridging the gap between complex machine learning algorithms and their interpretability. They display how an algorithm learns from the data and derives results. This opens up exciting possibilities for applying machine learning algorithms in the real world
Machine learning explainability in breast cancer survival
Machine Learning (ML) can improve the diagnosis, treatment decisions, and understanding of cancer. However, the low explainability of how “black box” ML methods produce their output hinders their clinical adoption. In this paper, we used data from the Netherlands Cancer Registry to generate a ML-based model to predict 10-year overall survival of breast cancer patients. Then, we used Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) to interpret the model's predictions. We found that, overall, LIME and SHAP tend to be consistent when explaining the contribution of different features. Nevertheless, the feature ranges where they have a mismatch can also be of interest, since they can help us identifying “turning points” where features go from favoring survived to favoring deceased (or vice versa). Explainability techniques can pave the way for better acceptance of ML techniques. However, their evaluation and translation to real-life scenarios need to be researched further