1,287 research outputs found
Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers
Background and Objective: Colorectal cancer is a high mortality cancer.
Clinical data analysis plays a crucial role in predicting the survival of
colorectal cancer patients, enabling clinicians to make informed treatment
decisions. However, utilizing clinical data can be challenging, especially when
dealing with imbalanced outcomes. This paper focuses on developing algorithms
to predict 1-, 3-, and 5-year survival of colorectal cancer patients using
clinical datasets, with particular emphasis on the highly imbalanced 1-year
survival prediction task. To address this issue, we propose a method that
creates a pipeline of some of standard balancing techniques to increase the
true positive rate. Evaluation is conducted on a colorectal cancer dataset from
the SEER database. Methods: The pre-processing step consists of removing
records with missing values and merging categories. The minority class of
1-year and 3-year survival tasks consists of 10% and 20% of the data,
respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN),
Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and
RENN approaches were used and compared for balancing the data with tree-based
classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient
Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method.
Results: The performance evaluation utilizes a 5-fold cross-validation
approach. In the case of highly imbalanced datasets (1-year), our proposed
method with LGBM outperforms other sampling methods with the sensitivity of
72.30%. For the task of imbalance (3-year survival), the combination of RENN
and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method
works best for highly imbalanced datasets. Conclusions: Our proposed method
significantly improves mortality prediction for the minority class of
colorectal cancer patients.Comment: 19 Pages, 6 Figures, 4 Table
A machine learning platform to optimize the translation of personalized network models to the clinic
PURPOSE
Dynamic network models predict clinical prognosis and inform therapeutic intervention by elucidating disease-driven aberrations at the systems level. However, the personalization of model predictions requires the profiling of multiple model inputs, which hampers clinical translation.
PATIENTS AND METHODS
We applied APOPTO-CELL, a prognostic model of apoptosis signaling, to showcase the establishment of computational platforms that require a reduced set of inputs. We designed two distinct and complementary pipelines: a probabilistic approach to exploit a consistent subpanel of inputs across the whole cohort (Ensemble) and a machine learning approach to identify a reduced protein set tailored for individual patients (Tree). Development was performed on a virtual cohort of 3,200,000 patients, with inputs estimated from clinically relevant protein profiles. Validation was carried out in an in-house stage III colorectal cancer cohort, with inputs profiled in surgical resections by reverse phase protein array (n = 120) and/or immunohistochemistry (n = 117).
RESULTS
Ensemble and Tree reproduced APOPTO-CELL predictions in the virtual patient cohort with 92% and 99% accuracy while decreasing the number of inputs to a consistent subset of three proteins (40% reduction) or a personalized subset of 2.7 proteins on average (46% reduction), respectively. Ensemble and Tree retained prognostic utility in the in-house colorectal cancer cohort. The association between the Ensemble accuracy and prognostic value (Spearman Ο = 0.43; P = .02) provided a rationale to optimize the input composition for specific clinical settings. Comparison between profiling by reverse phase protein array (gold standard) and immunohistochemistry (clinical routine) revealed that the latter is a suitable technology to quantify model inputs.
CONCLUSION
This study provides a generalizable framework to optimize the development of network-based prognostic assays and, ultimately, to facilitate their integration in the routine clinical workflow
Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations
Cancer arises from the accumulation of somatic mutations and genetic alterations in cell division checkpoints and apoptosis, this often leads to abnormal tumor proliferation. Proper classification of cancer-linked driver mutations will considerably help our understanding of the molecular dynamics of cancer. In this study, we compared several cancer-specific predictive models for prediction of driver mutations in cancer-linked genes that were validated on canonical data sets of functionally validated mutations and applied to a raw cancer genomics data. By analyzing pathogenicity prediction and conservation scores, we have shown that evolutionary conservation scores play a pivotal role in the classification of cancer drivers and were the most informative features in the driver mutation classification. Through extensive comparative analysis with structure-functional experiments and multicenter mutational calling data from PanCancer Atlas studies, we have demonstrated the robustness of our models and addressed the validity of computational predictions. We evaluated the performance of our models using the standard diagnostic metrics such as sensitivity, specificity, area under the curve and F-measure. To address the interpretability of cancer-specific classification models and obtain novel insights about molecular signatures of driver mutations, we have complemented machine learning predictions with structure-functional analysis of cancer driver mutations in several key tumor suppressor genes and oncogenes. Through the experiments carried out in this study, we found that evolutionary-based features have the strongest signal in the machine learning classification VII of driver mutations and provide orthogonal information to the ensembled-based scores that are prominent in the ranking of feature importance
μκ°κ±΄κ°μ λ΅μ ν΄λ¬μ€ν°λ§κ³Ό λ¨Έμ λ¬λ κΈ°λ²μ μ¬μ©ν μμμ‘΄μμ μΆμ μ§ λ° μ§νμ± μνμμ μμ‘΄ μμΈ‘
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : μκ³Όλν μκ³Όνκ³Ό, 2023. 2. μ€μνΈ.Background: In cancer-care, self-management strategies can help cancer patients improve their health-related quality of life (HRQoL) or survival, irrespective of the cancer stage or their treatment plan. However, there is insufficient research on the clustering of self-management strategies considering cancer stages in natural clinical settings; the prediction model of HRQoL or survival in cancer patients also lacks research. In addition, research that has comprehensively identified the relationship between self-management strategies, HRQoL, and survival still needs to be completed. Hence, we investigated their relationship using clustering methods, machine learning techniques (MLT), and path analysis of structural equation modeling (SEM).
Methods: In cancer survivors, cluster analyses using principal component analyses in varimax rotation and clustering of the k-means method were conducted to examine the interrelationship among self-management strategies in smart management strategies for health assessment tool (SAT). Multivariate-adjusted analyses were performed to identify the association of self-management strategies with HRQoL after 6 months. We constructed the HRQoL prediction model and compared the performance of the model with ensemble algorithms including decision tree, random forest, gradient boosting, eXtreme Gradient Boost (XGBoost), and LightGBM. Next, we selected the XGBoost model for further analysis. We demonstrated critical features of HRQoL and extracted the individual prediction result in the XGBoost model using SHAP. In advanced cancer patients, self-management clustering and multivariate-adjusted analyses for examining the association of the strategies with the HRQoL were conducted the same way as in cancer survivors. We performed dimensional multiple Cox proportional hazard regression analyses to determine critical predictors for 1-year survival. We established a survival prediction model with the XGBoost method using MLT with the critical predictors in the Cox regression model. To examine the causal relationship among SAT strategies, HRQoL, and survival, we used a subgroup analysis and a path analysis of structural equation modeling.
Results: All cancer survivors and advanced cancer patients experienced two clusters in the self-management strategies concurrently. However, the strategy clusters differed by cancer stage. Advanced-stage cancer patients used core strategies along with preparation and implementation strategies to overcome their crisis. Among all cancer patients, the self-management strategies had a positive association with improved HRQoL, even in advanced cancer patients. In the prediction model development, the XGBoost model for HRQoL showed high performance in cancer survivors. The important variables for each HRQoL factor were different. Moreover, there was a specific method to provide customized healthcare services by employing the individual prediction method with SHAP with a web-based survey study for cancer survivors. In advanced cancer patients, the univariate dimensional Cox model showed that ECOG performance status, marital status, sex, global QoL, dyspnea, pain, appetite loss, constipation, depression at baseline, and clinically meaningful change of emotional functioning were predictive factors with worse survival. In the prediction model using MLT, the XGBoost model of survival showed high performance. The performance was optimum when the model was constructed by combining variables selected by the Cox model and MLT methods: depression, pain, appetite loss, constipation, sex, ECOG performance status, and clinically meaningful change in emotional functioning. We also revealed a causal relationship among SAT strategies, depression, and survival in advanced cancer patients using path analysis.
Conclusions: This study is the first to examine the self-management strategy clusters considering cancer stages and different groups of cancer patients, such as cancer survivors and advanced cancer patients. To our knowledge, this is first study to have developed and validated HRQoL prediction models, interpreted the models, and suggested utilization of these results in a clinical setting for cancer survivors. Additionally, we revealed an association of self-management strategies with HRQoL and survival in advanced cancer patients using MLT methods and path analysis. These study results can increase the understanding of self-management strategies and help healthcare providers with healthcare services for cancer patients in the cancer-care continuum.μ°κ΅¬ λ°°κ²½: μ μΌμ΄ μ°μμ μμμ μκ°κ΄λ¦¬μ λ΅μ μ λ³κΈ° λλ μΉλ£ κ³νκ³Ό κ΄κ³μμ΄ μνμμ 건κ°κ΄λ ¨ μΆμ μ§ λλ μμ‘΄μ κ°μ νλλ° λμμ΄ λ μ μλ€. κ·Έλ¬λ μ€μ μμ νμ₯μμ μ λ³κΈ°λ₯Ό κ³ λ €ν μκ°κ΄λ¦¬μ λ΅μ΄ μ΄λ»κ² ν΄λ¬μ€ν°λ§ λλμ§μ λν μ°κ΅¬μ μνμμ 건κ°κ΄λ ¨ μΆμ μ§ λλ μμ‘΄ μμΈ‘ λͺ¨λΈμ λΆμ‘±ν μ€μ μ΄λ€. λν μνμμ μκ°κ΄λ¦¬μ λ΅κ³Ό 건κ°κ΄λ ¨ μΆμ μ§, μμ‘΄ κ°μ κ΄κ³λ₯Ό μ’
ν©μ μΌλ‘ μ΄ν΄λ³Έ μ°κ΅¬λ μμ§κΉμ§ μλ μ€μ μ΄λ€. λ°λΌμ λ³Έ μ°κ΅¬λ ν΄λ¬μ€ν°λ§ ν΅κ³ λ°©λ², λ¨Έμ λ¬λ κΈ°μ λ° κ΅¬μ‘°λ°©μ μ λͺ¨λΈμ κ²½λ‘λΆμμ νμ©νμ¬ μνμμ μκ°κ΄λ¦¬μ λ΅, 건κ°κ΄λ ¨ μΆμ μ§ λ° μμ‘΄ κ°μ κ΄κ³λ₯Ό κ·λͺ
νκ³ μ νμλ€.
μ°κ΅¬ λ°©λ²: μμμ‘΄μμ κ²½μ°, μλ‘κ² κ°λ°ν 건κ°κ²½μμ λ΅(Smart Management Strategies for Health Assessment Tool, SAT)μΌλ‘ μκ°κ΄λ¦¬μ λ΅μ μΈ‘μ νμ¬ SAT μ λ΅λ€ κ°μ μνΈκ΄κ³λ₯Ό μ‘°μ¬νκΈ° μν΄ μ£Όμ±λΆ λΆμκ³Ό K-mean ν΄λ¬μ€ν°λ§ λ°©λ²μ μ¬μ©ν κ΅°μ§ λΆμμ μννμλ€. λν SAT μ λ΅κ³Ό 6κ°μ νμ HRQoL κ°μ μ°κ΄μ±μ νμΈνκΈ° μν΄ λ€λ³λ λΆμμ μννμλ€. μμμ‘΄μμ HRQoL μμΈ‘ λͺ¨λΈ κ°λ° λ° κ²μ¦μ μν΄μλ μμΈ‘ λͺ¨λΈμ ꡬμ±νκ³ , κ²°μ νΈλ¦¬, λλ€ ν¬λ μ€νΈ, κ²½μ¬ λΆμ€ν
(Gradient boosting), XGBoost, and LightGBMμ μμλΈ μκ³ λ¦¬μ¦μ μ¬μ©νμ¬ λͺ¨λΈμ μ±λ₯μ λΉκ΅νμλ€. λͺ¨λΈ λΉκ΅ ν, μΆκ° λΆμμ μν΄ μ΅μ’
μ μΌλ‘ XGBoost λͺ¨λΈμ΄ μ νλμκ³ , XGBoostμ HRQoL μμΈ‘ λͺ¨λΈμ μ€μν λ³μλ₯Ό μ°Ύκ³ μ SHAPμ μ¬μ©νμ¬ νΉμ± μ€μλ (Feature importance) λ° κ°λ³ μμΈ‘ (Individual prediction) λΆμμ μννμλ€. μ§νμ± μνμμμ HRQoLκ³Ό SAT μ λ΅μ μ°κ΄μ±μ μ‘°μ¬νκΈ° μν ν΄λ¬μ€ν°λ§ λ° λ€λ³λ λΆμ λ°©λ²μ μμμ‘΄μμμ μννλ λ°©λ²κ³Ό λμΌνμλ€. μμ‘΄ μμΈ‘ λͺ¨λΈ κ°λ°μ μν΄ κΈ°μ‘΄μ ν΅κ³λΆμμ μ¬μ©νμ¬ μ°¨μ λ€μ€ Cox λΉλ‘ μν νκ· λΆμμ μννμκ³ , λ¨Έμ λ¬λ κΈ°λ²μ XGBoostλ°©λ²μΌλ‘ μμ‘΄ μμΈ‘ λͺ¨λΈμ κ°λ°νμλ€. λ³Έ μ°κ΅¬μμλ μ ν΅μ ν΅κ³ λ°©λ²μ μν΄ μ νλ λ³μμ λ¨Έμ λ¬λ κΈ°λ²μ μν΄ μ νλ λ³μ λ° λ λ°©λ²μ μν΄ μ νλ λ³μλ₯Ό κ²°ν©νμ¬ μμΈ‘λͺ¨λΈμ κ°λ³μ μΌλ‘ ꡬμ±νμκ³ , μ±λ₯μ λΉκ΅νμλ€. λν ꡬ쑰방μ μ λͺ¨λΈμ νμ©ν κ²½λ‘λΆμμ ν΅ν΄ SAT μ λ΅κ³Ό HRQoL, μμ‘΄ κ°μ μΈκ³Όκ΄κ³λ₯Ό κ·λͺ
νκ³ μ νμλ€.
μ°κ΅¬ κ²°κ³Ό: μμμ‘΄μ λ° μ§νμ± μνμμ SAT μ λ΅ ν΄λ¬μ€ν°λ§μ μλ³κΈ°μ λ°λΌ λ€λ₯΄κ² λνλ¬λ€. μ€κΈ°-λ§κΈ° λ¨κ³ μ νμλ€μ μ΄κΈ° λ¨κ³ μνμλ€μ λΉν΄ μκΈ°λ₯Ό 극볡νκΈ° μν΄ μκ°κ΄λ¦¬μ λ΅μμ μΉλ£ μκΈ° λ° μλ³κΈ°μ κ΄κ³μμ΄ λͺ¨λ λ¨κ³μμ μ€μν ν΅μ¬ μ λ΅μ μ€λΉ λ° μ€νμ λ΅κ³Ό ν¨κ» μ¬μ©νλ κ²μΌλ‘ λνλ¬λ€. λν μ΄λ¬ν SAT μ λ΅μ μ§νμ± μνμλ₯Ό ν¬ν¨νμ¬ λͺ¨λ μνμμκ²μ κ°μ λ HRQoLκ³Ό κΈμ μ μΈ μ°κ΄μ±μ 보μ¬μ£Όμλ€. λ¨Έμ λ¬λμ νμ©ν HRQoLμ μμΈ‘ λͺ¨λΈμ μμμ‘΄μμμ λμ μμΈ‘ μ±λ₯μ 보μ¬μ£Όμλ€. κ·Έλ¬λ, κ° HRQoL μμΈμ λν μ€μ λ³μλ μλ‘ λ€λ₯΄κ² λνλ¬λ€. λν λ³Έ μ°κ΅¬λ μμμ‘΄μλ₯Ό λμμΌλ‘ ν μΉ κΈ°λ° μ€λ¬Έ μ‘°μ¬ μ°κ΅¬μ μλ‘κ² μ°ΎμλΈ SHAPμ ν΅ν κ°μΈ μμΈ‘ λ°©λ²μ μ λͺ©ν¨μΌλ‘μ¨ μμμ‘΄μλ₯Ό λμμΌλ‘ ν κ°μΈ λ§μΆ€ν μλ£ μλΉμ€ μ 곡 λ°©μμ ꡬ체μ μΌλ‘ μ μνμλ€. μ§νμ± μνμμμ μ°¨μλ³ λ¨λ³λ Cox λͺ¨λΈμμλ ECOG μν μν, μ±λ³, κ²°νΌμν, μ§λ¨μμ μμμ μΌλ°μ μΆμ μ§ μ ν, νΈν‘κ³€λ, ν΅μ¦, μμκ°ν΄, λ³λΉ, μ°μΈ, 12μ£Ό λμμ μμμ μΌλ‘ μλ―Έ μλ μ μμ κΈ°λ₯ λ° μ¬νμ μ§μ§μ λ³νκ° μ΅μ’
μ μΌλ‘ λ μ νλ μμ‘΄κ³Ό κ΄λ ¨μ΄ μλ μμΈμΌλ‘ λνλ¬λ€. λ¨Έμ λ¬λλ°©λ²μ νμ©ν μμΈ‘ λͺ¨νμμλ λμ μμ‘΄ μμΈ‘ μ±λ₯μ΄ λνλ¬κ³ , BorutaSHAPμ ν΅ν΄μλ μ°μΈ, ν΅μ¦, μμκ°ν΄, λ³λΉ, μ±λ³μ΄ μμ‘΄κ³Ό μ°κ΄λ μ€μν μμΈμΌλ‘ μ λ³λμλ€. κΈ°μ‘΄μ μ ν΅μ ν΅κ³λ°©λ²κ³Ό λ¨Έμ λ¬λ κΈ°λ²μΌλ‘ μ μ λ λ³μλ₯Ό κ²°ν©νμ¬ λͺ¨λΈμ ꡬμ±νμμ λ, μμ‘΄ μμΈ‘ λͺ¨νμμ κ°μ₯ λμ μ±λ₯μ΄ λ°κ²¬λμλ€. κ²½λ‘λΆμμμλ SATμ λ΅, μ°μΈ, μμ‘΄ κ°μ μΈκ³Όκ΄κ³λ₯Ό λ°νμΌλ©°, μ°μΈ λ³μλ₯Ό μμ 맀κ°λ‘ SAT μ λ΅μ μμ‘΄μ λν κ°μ ν¨κ³Όκ° μλ κ²μ΄ λ°κ²¬λμλ€.
μ°κ΅¬ κ²°λ‘ : λ³Έ μ°κ΅¬λ μ²μμΌλ‘ μμμ‘΄μ λ° μ§νμ± μνμλ₯Ό λͺ¨λ ν¬ν¨νμ¬ μλ³κΈ°λ₯Ό κ³ λ €ν μκ°κ΄λ¦¬μ λ΅ μ¬μ© κ΅°μ§ λΆμμ μλνμλ€. λν λ³Έ μ°κ΅¬λ μ²μμΌλ‘ μμμ‘΄μμκ² μ€μν 건κ°κ΄λ ¨ μΆμ μ§μ μμΈ‘νλ λ¨μν λͺ¨λΈμ κ°λ° λ° κ²μ¦νμκ³ , μ€λͺ
κ°λ₯ν μΈκ³΅μ§λ₯ μκ³ λ¦¬μ¦μ νμ©νμ¬ λͺ¨λΈμ ν΄μνκ³ , μμμ‘΄μλ₯Ό μν΄ μμνκ²½μμ λ³Έ μ°κ΅¬μ κ²°κ³Ό νμ©ν μ μλ λ°©μμ μ μνμλ€. λν λ³Έ μ°κ΅¬μμλ λ¨Έμ λ¬λ κΈ°λ²κ³Ό κ²½λ‘λΆμμ μ¬μ©νμ¬ μ§νμ± μνμμ μκ°κ΄λ¦¬μ λ΅κ³Ό 건κ°κ΄λ ¨ μΆμ μ§ λ° μμ‘΄ κ°μ μ§Β·κ°μ μ μΌλ‘ κΈμ μ μΈ μ°κ΄μ±μ΄ μμμ λ°κ²¬νμλ€. μ΄λ¬ν μ°κ΅¬κ²°κ³Όλ μλ‘κ² κ°λ°ν SAT μκ°κ΄λ¦¬μ λ΅μ΄ μμμ₯λ©΄μμ μνμμκ² μ μ©ν κ°μ
λκ΅¬λ‘ μ¬μ©λ μ μμμ 보μ¬μ€λ€. μ’
ν©μ μΌλ‘ λ³Έ μ°κ΅¬λ μνμμ μκ°κ΄λ¦¬μ λ΅ μ¬μ© λ° κ·Έ ν¨κ³Όμ±μ λν μ΄ν΄μ νμ λνκ³ , μλ£μ 곡μκ° μ μΌμ΄ μ°μμ μμμ μνμμκ² λμμ΄ λλ μλ£ μλΉμ€λ₯Ό μ 곡νλλ° μκ°κ΄λ¦¬μ λ΅μ μ΄λ»κ² νμ©ν μ μμμ§ μ’
ν©μ μΈ κ²°κ³Ό λ° μμμ νμ©λ°©μμ μ μνμλ€λλ° μμκ° μλ€.Chapter 1. Introduction 1
1.1. Study Background 1
1.2. Literature Review 7
1.3. Research Objectives and Hypothesis 16
1.4. Definition of cancer survivors and advanced cancer patients
in this study 19
Chapter 2. Methods 21
2.1. Study Design 21
2.2. Study Participants 23
2.3. Measurements 25
2.4. Statistical Methods 30
Chapter 3. Results 42
3.1. Study Participantscharacteristics 42
3.2. Self-management clustering results 45
3.3. The association of self-management clustering with HRQoL 51
3.4. HRQoL prediction model development and validation 55
3.5. Survival prediction model development and validation 72
3.6. Causal relationship among SAT, HRQoL, and Survival 92
Chapter 4. Discussion 96
Chapter 5. Conclusion 104
Bibliography 105
Abstract in Korean 113
Supplementary Information 116λ°
Developing an individualized survival prediction model for rectal cancer
This work presents a survivability prediction model for rectal cancer patients developed through machine learning techniques. The model was based on the most complete worldwide cancer dataset known, the SEER dataset. After preprocessing, the training data consisted of 12,818 records of rectal cancer patients. Six features were extracted from a feature selection process, finding the most relevant characteristics which affect the survivability of rectal cancer. The model constructed with six features was compared with another one with 18 features indicated by a physician. The results show that the performance of the six-feature model is close to that of the model using 18 features, which indicates that the first may be a good compromise between usability and performance.FCT - Fuel Cell Technologies Program (SFRH/BD/85291/2012)info:eu-repo/semantics/publishedVersio
Proc Mach Learn Res
Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (|) deploying ensemble machine learning for prediction, (|) establishing a set of classification rules for the predicted probabilities, and (|) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.HHSN261201000140C/CA/NCI NIH HHS/United StatesHHSN261201000035C/CA/NCI NIH HHS/United StatesT32 MH019733/MH/NIMH NIH HHS/United StatesHHSN261201000035I/CA/NCI NIH HHS/United StatesHHSN261201000034C/CA/NCI NIH HHS/United StatesU58 DP003862/DP/NCCDPHP CDC HHS/United States2018-12-10T00:00:00Z30542673PMC6287925vault:3125
A novel integrative risk index of papillary thyroid cancer progression combining genomic alterations and clinical factors.
Although the majority of papillary thyroid cancer (PTC) is indolent, a subset of PTC behaves aggressively despite the best available treatment. A major clinical challenge is to reliably distinguish early on between those patients who need aggressive treatment from those who do not. Using a large cohort of PTC samples obtained from The Cancer Genome Atlas (TCGA), we analyzed the association between disease progression and multiple forms of genomic data, such as transcriptome, somatic mutations, and somatic copy number alterations, and found that genes related to FOXM1 signaling pathway were significantly associated with PTC progression. Integrative genomic modeling was performed, controlling for demographic and clinical characteristics, which included patient age, gender, TNM stages, histological subtypes, and history of other malignancy, using a leave-one-out elastic net model and 10-fold cross validation. For each subject, the model from the remaining subjects was used to determine the risk index, defined as a linear combination of the clinical and genomic variables from the elastic net model, and the stability of the risk index distribution was assessed through 2,000 bootstrap resampling. We developed a novel approach to combine genomic alterations and patient-related clinical factors that delineates the subset of patients who have more aggressive disease from those whose tumors are indolent and likely will require less aggressive treatment and surveillance (p = 4.62 Γ 10-10, log-rank test). Our results suggest that risk index modeling that combines genomic alterations with current staging systems provides an opportunity for more effective anticipation of disease prognosis and therefore enhanced precision management of PTC
Efficient Feature Selection and ML Algorithm for Accurate Diagnostics
Machine learning algorithms have been deployed in numerous optimization, prediction and classification problems. This has endeared them for application in fields such as computer networks and medical diagnosis. Although these machine learning algorithms achieve convincing results in these fields, they face numerous challenges when deployed on imbalanced dataset. Consequently, these algorithms are often biased towards majority class, hence unable to generalize the learning process. In addition, they are unable to effectively deal with high-dimensional datasets. Moreover, the utilization of conventional feature selection techniques from a dataset based on attribute significance render them ineffective for majority of the diagnosis applications. In this paper, feature selection is executed using the more effective Neighbour Components Analysis (NCA). During the classification process, an ensemble classifier comprising of K-Nearest Neighbours (KNN), Naive Bayes (NB), Decision Tree (DT) and Support Vector Machine (SVM) is built, trained and tested. Finally, cross validation is carried out to evaluate the developed ensemble model. The results shows that the proposed classifier has the best performance in terms of precision, recall, F-measure and classification accuracy
A Comparative Study for Methodologies and Algorithms Used In Colon Cancer Diagnoses and Detection
Colon cancer is also referred to as colorectal cancer; it is a kind of cancer that starts with colon damage to the large intestine in the last section of the digestive tract. Elderly people typically suffer from colon cancer, but this may occur at any age. It normally starts as a little, noncancerous (benign) mass of cells named polyps that structure within the colon. After a period of time these polyps can turn into advanced malignant tumors that attack the human body and some of these polyps can become colon cancers. So far, no concrete causes have been identified and the complete cancer treatment is very difficult to be detected by doctors in the medical field. Colon cancer often has no symptoms in an early stage so detecting it at this stage is curable but colorectal cancer diagnosis in the final stages (stage IV), gives it the opportunity to spread into different pieces of the body, which are difficult to treat successfully, and the person\u27s opportunities of survival become much lower. False diagnosis of colorectal cancer which means wrong treatment for patients with long-term infections and they will be suffering from colon cancer this causing the death for these patients. Also, cancer treatment needs more time and a lot of money. This paper provides a comparative study for methodologies and algorithms used in the colon cancer diagnoses and detection this can help for proposing a prediction for risk levels of colon cancer disease using CNN algorithm of deep learning (Convolutional Neural Networks Algorithm)
- β¦