193 research outputs found
Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers
Background and Objective: Colorectal cancer is a high mortality cancer.
Clinical data analysis plays a crucial role in predicting the survival of
colorectal cancer patients, enabling clinicians to make informed treatment
decisions. However, utilizing clinical data can be challenging, especially when
dealing with imbalanced outcomes. This paper focuses on developing algorithms
to predict 1-, 3-, and 5-year survival of colorectal cancer patients using
clinical datasets, with particular emphasis on the highly imbalanced 1-year
survival prediction task. To address this issue, we propose a method that
creates a pipeline of some of standard balancing techniques to increase the
true positive rate. Evaluation is conducted on a colorectal cancer dataset from
the SEER database. Methods: The pre-processing step consists of removing
records with missing values and merging categories. The minority class of
1-year and 3-year survival tasks consists of 10% and 20% of the data,
respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN),
Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and
RENN approaches were used and compared for balancing the data with tree-based
classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient
Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method.
Results: The performance evaluation utilizes a 5-fold cross-validation
approach. In the case of highly imbalanced datasets (1-year), our proposed
method with LGBM outperforms other sampling methods with the sensitivity of
72.30%. For the task of imbalance (3-year survival), the combination of RENN
and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method
works best for highly imbalanced datasets. Conclusions: Our proposed method
significantly improves mortality prediction for the minority class of
colorectal cancer patients.Comment: 19 Pages, 6 Figures, 4 Table
Treating colon cancer survivability prediction as a classification problem
This work presents a survivability prediction model for colon cancer developed
with machine learning techniques. Survivability was viewed as a classification
task where it was necessary to determine if a patient would survive each of
the five years following treatment. The model was based on the SEER dataset
which, after preprocessing, consisted of 38,592 records of colon cancer patients.
Six features were extracted from a feature selection process in order to construct
the model. This model was compared with another one with 18 features
indicated by a physician. The results show that the performance of the sixfeature
model is close to that of the model using 18 features, which indicates
that the first may be a good compromise between usability and performance.This work has been supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT – Fundação para a
Ciência e Tecnologia within the Project Scope UID/CEC/00319/2013. The work of Tiago Oliveira is supported
by a FCT grant with the reference SFRH/BD/85291/ 2012.info:eu-repo/semantics/publishedVersio
Treating Colon Cancer Survivability Prediction as a Classification Problem
This work presents a survivability prediction model for colon cancer developed with machine learning techniques. Survivability was viewed as a classification task where it was necessary to determine if a patient would survive each of the five years following treatment. The model was based on the SEER dataset which, after preprocessing, consisted of 38,592 records of colon cancer patients. Six features were extracted from a feature selection process in order to construct the model. This model was compared with another one with 18 features indicated by a physician. The results show that the performance of the six-feature model is close to that of the model using 18 features, which indicates that the first may be a good compromise between usability and performance
A mobile and evolving tool to predict colorectal cancer survivability
In this work, a tool for the survivability prediction of patients with colon or rectal cancer, up to five years after diagnosis and treatment, is presented. Indeed, an accurate survivability prediction is a difficult task for health care professionals and of high concern to patients, so that they can make the most of the rest of their lives. The distinguishing features of the tool include a balance between the number of necessary inputs and prediction performance, being mobile-friendly, and featuring an online learning component that enables the automatic evolution of the prediction models upon the addition of new cases.This work has been supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2013. The work of Tiago Oliveira is supported by a FCT grant with the reference SFRH/BD/85291/2012.info:eu-repo/semantics/publishedVersio
Developing an individualized survival prediction model for rectal cancer
This work presents a survivability prediction model for rectal cancer patients developed through machine learning techniques. The model was based on the most complete worldwide cancer dataset known, the SEER dataset. After preprocessing, the training data consisted of 12,818 records of rectal cancer patients. Six features were extracted from a feature selection process, finding the most relevant characteristics which affect the survivability of rectal cancer. The model constructed with six features was compared with another one with 18 features indicated by a physician. The results show that the performance of the six-feature model is close to that of the model using 18 features, which indicates that the first may be a good compromise between usability and performance.FCT - Fuel Cell Technologies Program (SFRH/BD/85291/2012)info:eu-repo/semantics/publishedVersio
Stage-Specific Predictive Models for Cancer Survivability
Survivability of cancer strongly depends on the stage of cancer. In most previous works, machine learning survivability prediction models for a particular cancer, were trained and evaluated together on all stages of the cancer. In this work, we trained and evaluated survivability prediction models for five major cancers, together on all stages and separately for every stage. We named these models joint and stage-specific models respectively. The obtained results for the cancers which we investigated reveal that, the best model to predict the survivability of the cancer for one specific stage is the model which is specifically built for that stage. Additionally, we saw that for every stage of cancer, the most important features to predict survivability, differed from other stages. By evaluating the models separately on different stages we found that their performance differed on different stages. We also found that evaluating the models together on all stages, as was done in past, is misleading because it overestimates performance
Application of Machine Learning in Cancer Research
This dissertation revisits the problem of five-year survivability predictions for breast cancer using machine learning tools. This work is distinguishable from the past experiments based on the size of the training data, the unbalanced distribution of data in minority and majority classes, and modified data cleaning procedures. These experiments are also based on the principles of TIDY data and reproducible research. In order to fine-tune the predictions, a set of experiments were run using naive Bayes, decision trees, and logistic regression.
Of particular interest were strategies to improve the recall level for the minority class, as the cost of misclassification is prohibitive. One of The main contributions of this work is that logistic regression with the proper predictors and class weight gives the highest precision/recall level for the minority class.
In regression modeling with large number of predictors, correlation among predictors is quite common, and the estimated model coefficients might not be very reliable. In these situations, the Variance Inflation Factor (VIF) and the Generalized Variance Inflation Factor
(GVIF) are used to overcome the correlation problem. Our experiments are based on the Surveillance, Epidemiology, and End Results (SEER) database for the problem of survivability prediction. Some of the specific contributions of this thesis are:
· Detailed process for data cleaning and binary classification of 338,596 breast cancer patients.
· Computational approach for omitting predictors and categorical predictors based on VIF and GVIF.
· Various applications of Synthetic Minority Over-sampling Techniques (SMOTE) to increase precision and recall.
· An application of Edited Nearest Neighbor to obtain the highest F1-measure.
In addition, this work provides precise algorithms and codes for determining class membership and execution of competing methods. These codes can facilitate the reproduction and extension of our work by other researchers
A Comparative Study for Methodologies and Algorithms Used In Colon Cancer Diagnoses and Detection
Colon cancer is also referred to as colorectal cancer; it is a kind of cancer that starts with colon damage to the large intestine in the last section of the digestive tract. Elderly people typically suffer from colon cancer, but this may occur at any age. It normally starts as a little, noncancerous (benign) mass of cells named polyps that structure within the colon. After a period of time these polyps can turn into advanced malignant tumors that attack the human body and some of these polyps can become colon cancers. So far, no concrete causes have been identified and the complete cancer treatment is very difficult to be detected by doctors in the medical field. Colon cancer often has no symptoms in an early stage so detecting it at this stage is curable but colorectal cancer diagnosis in the final stages (stage IV), gives it the opportunity to spread into different pieces of the body, which are difficult to treat successfully, and the person\u27s opportunities of survival become much lower. False diagnosis of colorectal cancer which means wrong treatment for patients with long-term infections and they will be suffering from colon cancer this causing the death for these patients. Also, cancer treatment needs more time and a lot of money. This paper provides a comparative study for methodologies and algorithms used in the colon cancer diagnoses and detection this can help for proposing a prediction for risk levels of colon cancer disease using CNN algorithm of deep learning (Convolutional Neural Networks Algorithm)
Predicting breast cancer risk, recurrence and survivability
This thesis focuses on predicting breast cancer at early stages by using machine learning algorithms based on biological datasets. The accuracy of those algorithms has been improved to enable the physicians to enhance the success of treatment, thus saving lives and avoiding several further medical tests
- …