1,287 research outputs found

    Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers

    Full text link
    Background and Objective: Colorectal cancer is a high mortality cancer. Clinical data analysis plays a crucial role in predicting the survival of colorectal cancer patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical data can be challenging, especially when dealing with imbalanced outcomes. This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients using clinical datasets, with particular emphasis on the highly imbalanced 1-year survival prediction task. To address this issue, we propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate. Evaluation is conducted on a colorectal cancer dataset from the SEER database. Methods: The pre-processing step consists of removing records with missing values and merging categories. The minority class of 1-year and 3-year survival tasks consists of 10% and 20% of the data, respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN), Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and RENN approaches were used and compared for balancing the data with tree-based classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method. Results: The performance evaluation utilizes a 5-fold cross-validation approach. In the case of highly imbalanced datasets (1-year), our proposed method with LGBM outperforms other sampling methods with the sensitivity of 72.30%. For the task of imbalance (3-year survival), the combination of RENN and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method works best for highly imbalanced datasets. Conclusions: Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.Comment: 19 Pages, 6 Figures, 4 Table

    A machine learning platform to optimize the translation of personalized network models to the clinic

    Get PDF
    PURPOSE Dynamic network models predict clinical prognosis and inform therapeutic intervention by elucidating disease-driven aberrations at the systems level. However, the personalization of model predictions requires the profiling of multiple model inputs, which hampers clinical translation. PATIENTS AND METHODS We applied APOPTO-CELL, a prognostic model of apoptosis signaling, to showcase the establishment of computational platforms that require a reduced set of inputs. We designed two distinct and complementary pipelines: a probabilistic approach to exploit a consistent subpanel of inputs across the whole cohort (Ensemble) and a machine learning approach to identify a reduced protein set tailored for individual patients (Tree). Development was performed on a virtual cohort of 3,200,000 patients, with inputs estimated from clinically relevant protein profiles. Validation was carried out in an in-house stage III colorectal cancer cohort, with inputs profiled in surgical resections by reverse phase protein array (n = 120) and/or immunohistochemistry (n = 117). RESULTS Ensemble and Tree reproduced APOPTO-CELL predictions in the virtual patient cohort with 92% and 99% accuracy while decreasing the number of inputs to a consistent subset of three proteins (40% reduction) or a personalized subset of 2.7 proteins on average (46% reduction), respectively. Ensemble and Tree retained prognostic utility in the in-house colorectal cancer cohort. The association between the Ensemble accuracy and prognostic value (Spearman ρ = 0.43; P = .02) provided a rationale to optimize the input composition for specific clinical settings. Comparison between profiling by reverse phase protein array (gold standard) and immunohistochemistry (clinical routine) revealed that the latter is a suitable technology to quantify model inputs. CONCLUSION This study provides a generalizable framework to optimize the development of network-based prognostic assays and, ultimately, to facilitate their integration in the routine clinical workflow

    Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations

    Get PDF
    Cancer arises from the accumulation of somatic mutations and genetic alterations in cell division checkpoints and apoptosis, this often leads to abnormal tumor proliferation. Proper classification of cancer-linked driver mutations will considerably help our understanding of the molecular dynamics of cancer. In this study, we compared several cancer-specific predictive models for prediction of driver mutations in cancer-linked genes that were validated on canonical data sets of functionally validated mutations and applied to a raw cancer genomics data. By analyzing pathogenicity prediction and conservation scores, we have shown that evolutionary conservation scores play a pivotal role in the classification of cancer drivers and were the most informative features in the driver mutation classification. Through extensive comparative analysis with structure-functional experiments and multicenter mutational calling data from PanCancer Atlas studies, we have demonstrated the robustness of our models and addressed the validity of computational predictions. We evaluated the performance of our models using the standard diagnostic metrics such as sensitivity, specificity, area under the curve and F-measure. To address the interpretability of cancer-specific classification models and obtain novel insights about molecular signatures of driver mutations, we have complemented machine learning predictions with structure-functional analysis of cancer driver mutations in several key tumor suppressor genes and oncogenes. Through the experiments carried out in this study, we found that evolutionary-based features have the strongest signal in the machine learning classification VII of driver mutations and provide orthogonal information to the ensembled-based scores that are prominent in the ranking of feature importance

    μžκ°€κ±΄κ°•μ „λž΅μ˜ ν΄λŸ¬μŠ€ν„°λ§κ³Ό λ¨Έμ‹ λŸ¬λ‹ 기법을 μ‚¬μš©ν•œ μ•”μƒμ‘΄μžμ˜ μ‚Άμ˜ 질 및 진행성 μ•”ν™˜μžμ˜ 생쑴 예츑

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : μ˜κ³ΌλŒ€ν•™ μ˜κ³Όν•™κ³Ό, 2023. 2. 윀영호.Background: In cancer-care, self-management strategies can help cancer patients improve their health-related quality of life (HRQoL) or survival, irrespective of the cancer stage or their treatment plan. However, there is insufficient research on the clustering of self-management strategies considering cancer stages in natural clinical settings; the prediction model of HRQoL or survival in cancer patients also lacks research. In addition, research that has comprehensively identified the relationship between self-management strategies, HRQoL, and survival still needs to be completed. Hence, we investigated their relationship using clustering methods, machine learning techniques (MLT), and path analysis of structural equation modeling (SEM). Methods: In cancer survivors, cluster analyses using principal component analyses in varimax rotation and clustering of the k-means method were conducted to examine the interrelationship among self-management strategies in smart management strategies for health assessment tool (SAT). Multivariate-adjusted analyses were performed to identify the association of self-management strategies with HRQoL after 6 months. We constructed the HRQoL prediction model and compared the performance of the model with ensemble algorithms including decision tree, random forest, gradient boosting, eXtreme Gradient Boost (XGBoost), and LightGBM. Next, we selected the XGBoost model for further analysis. We demonstrated critical features of HRQoL and extracted the individual prediction result in the XGBoost model using SHAP. In advanced cancer patients, self-management clustering and multivariate-adjusted analyses for examining the association of the strategies with the HRQoL were conducted the same way as in cancer survivors. We performed dimensional multiple Cox proportional hazard regression analyses to determine critical predictors for 1-year survival. We established a survival prediction model with the XGBoost method using MLT with the critical predictors in the Cox regression model. To examine the causal relationship among SAT strategies, HRQoL, and survival, we used a subgroup analysis and a path analysis of structural equation modeling. Results: All cancer survivors and advanced cancer patients experienced two clusters in the self-management strategies concurrently. However, the strategy clusters differed by cancer stage. Advanced-stage cancer patients used core strategies along with preparation and implementation strategies to overcome their crisis. Among all cancer patients, the self-management strategies had a positive association with improved HRQoL, even in advanced cancer patients. In the prediction model development, the XGBoost model for HRQoL showed high performance in cancer survivors. The important variables for each HRQoL factor were different. Moreover, there was a specific method to provide customized healthcare services by employing the individual prediction method with SHAP with a web-based survey study for cancer survivors. In advanced cancer patients, the univariate dimensional Cox model showed that ECOG performance status, marital status, sex, global QoL, dyspnea, pain, appetite loss, constipation, depression at baseline, and clinically meaningful change of emotional functioning were predictive factors with worse survival. In the prediction model using MLT, the XGBoost model of survival showed high performance. The performance was optimum when the model was constructed by combining variables selected by the Cox model and MLT methods: depression, pain, appetite loss, constipation, sex, ECOG performance status, and clinically meaningful change in emotional functioning. We also revealed a causal relationship among SAT strategies, depression, and survival in advanced cancer patients using path analysis. Conclusions: This study is the first to examine the self-management strategy clusters considering cancer stages and different groups of cancer patients, such as cancer survivors and advanced cancer patients. To our knowledge, this is first study to have developed and validated HRQoL prediction models, interpreted the models, and suggested utilization of these results in a clinical setting for cancer survivors. Additionally, we revealed an association of self-management strategies with HRQoL and survival in advanced cancer patients using MLT methods and path analysis. These study results can increase the understanding of self-management strategies and help healthcare providers with healthcare services for cancer patients in the cancer-care continuum.연ꡬ λ°°κ²½: μ•” μΌ€μ–΄ μ—°μ†μ„ μƒμ—μ„œ μžκ°€κ΄€λ¦¬μ „λž΅μ€ μ•” 병기 λ˜λŠ” 치료 κ³„νšκ³Ό 관계없이 μ•”ν™˜μžμ˜ 건강관련 μ‚Άμ˜ 질 λ˜λŠ” 생쑴을 κ°œμ„ ν•˜λŠ”λ° 도움이 될 수 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ μ‹€μ œ μž„μƒ ν˜„μž₯μ—μ„œ μ•” 병기λ₯Ό κ³ λ €ν•œ μžκ°€κ΄€λ¦¬μ „λž΅μ΄ μ–΄λ–»κ²Œ ν΄λŸ¬μŠ€ν„°λ§ λ˜λŠ”μ§€μ— λŒ€ν•œ 연ꡬ와 μ•”ν™˜μžμ˜ 건강관련 μ‚Άμ˜ 질 λ˜λŠ” 생쑴 예츑 λͺ¨λΈμ€ λΆ€μ‘±ν•œ 싀정이닀. λ˜ν•œ μ•”ν™˜μžμ˜ μžκ°€κ΄€λ¦¬μ „λž΅κ³Ό 건강관련 μ‚Άμ˜ 질, 생쑴 κ°„μ˜ 관계λ₯Ό μ’…ν•©μ μœΌλ‘œ μ‚΄νŽ΄λ³Έ μ—°κ΅¬λŠ” μ•„μ§κΉŒμ§€ μ—†λŠ” 싀정이닀. λ”°λΌμ„œ λ³Έ μ—°κ΅¬λŠ” ν΄λŸ¬μŠ€ν„°λ§ 톡계 방법, λ¨Έμ‹ λŸ¬λ‹ 기술 및 ꡬ쑰방정식 λͺ¨λΈμ˜ κ²½λ‘œλΆ„μ„μ„ ν™œμš©ν•˜μ—¬ μ•”ν™˜μžμ˜ μžκ°€κ΄€λ¦¬μ „λž΅, 건강관련 μ‚Άμ˜ 질 및 생쑴 κ°„μ˜ 관계λ₯Ό 규λͺ…ν•˜κ³ μž ν•˜μ˜€λ‹€. 연ꡬ 방법: μ•”μƒμ‘΄μžμ˜ 경우, μƒˆλ‘­κ²Œ κ°œλ°œν•œ κ±΄κ°•κ²½μ˜μ „λž΅(Smart Management Strategies for Health Assessment Tool, SAT)으둜 μžκ°€κ΄€λ¦¬μ „λž΅μ„ μΈ‘μ •ν•˜μ—¬ SAT μ „λž΅λ“€ κ°„μ˜ μƒν˜Έκ΄€κ³„λ₯Ό μ‘°μ‚¬ν•˜κΈ° μœ„ν•΄ μ£Όμ„±λΆ„ 뢄석과 K-mean ν΄λŸ¬μŠ€ν„°λ§ 방법을 μ‚¬μš©ν•œ ꡰ집 뢄석을 μˆ˜ν–‰ν•˜μ˜€λ‹€. λ˜ν•œ SAT μ „λž΅κ³Ό 6κ°œμ›” ν›„μ˜ HRQoL κ°„μ˜ 연관성을 ν™•μΈν•˜κΈ° μœ„ν•΄ λ‹€λ³€λŸ‰ 뢄석을 μˆ˜ν–‰ν•˜μ˜€λ‹€. μ•”μƒμ‘΄μžμ˜ HRQoL 예츑 λͺ¨λΈ 개발 및 검증을 μœ„ν•΄μ„œλŠ” 예츑 λͺ¨λΈμ„ κ΅¬μ„±ν•˜κ³ , κ²°μ • 트리, 랜덀 포레슀트, 경사 λΆ€μŠ€νŒ… (Gradient boosting), XGBoost, and LightGBM의 앙상블 μ•Œκ³ λ¦¬μ¦˜μ„ μ‚¬μš©ν•˜μ—¬ λͺ¨λΈμ˜ μ„±λŠ₯을 λΉ„κ΅ν•˜μ˜€λ‹€. λͺ¨λΈ 비ꡐ ν›„, μΆ”κ°€ 뢄석을 μœ„ν•΄ μ΅œμ’…μ μœΌλ‘œ XGBoost λͺ¨λΈμ΄ μ„ νƒλ˜μ—ˆκ³ , XGBoost의 HRQoL 예츑 λͺ¨λΈμ˜ μ€‘μš”ν•œ λ³€μˆ˜λ₯Ό 찾고자 SHAP을 μ‚¬μš©ν•˜μ—¬ νŠΉμ„± μ€‘μš”λ„ (Feature importance) 및 κ°œλ³„ 예츑 (Individual prediction) 뢄석을 μˆ˜ν–‰ν•˜μ˜€λ‹€. 진행성 μ•”ν™˜μžμ—μ„œ HRQoLκ³Ό SAT μ „λž΅μ˜ 연관성을 μ‘°μ‚¬ν•˜κΈ° μœ„ν•œ ν΄λŸ¬μŠ€ν„°λ§ 및 λ‹€λ³€λŸ‰ 뢄석 방법은 μ•”μƒμ‘΄μžμ—μ„œ μˆ˜ν–‰ν–ˆλ˜ 방법과 λ™μΌν•˜μ˜€λ‹€. 생쑴 예츑 λͺ¨λΈ κ°œλ°œμ„ μœ„ν•΄ 기쑴의 톡계뢄석을 μ‚¬μš©ν•˜μ—¬ 차원 닀쀑 Cox λΉ„λ‘€ μœ„ν—˜ νšŒκ·€ 뢄석을 μˆ˜ν–‰ν•˜μ˜€κ³ , λ¨Έμ‹ λŸ¬λ‹ κΈ°λ²•μ˜ XGBoostλ°©λ²•μœΌλ‘œ 생쑴 예츑 λͺ¨λΈμ„ κ°œλ°œν•˜μ˜€λ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” 전톡적 톡계 방법에 μ˜ν•΄ μ„ νƒλœ λ³€μˆ˜μ™€ λ¨Έμ‹ λŸ¬λ‹ 기법에 μ˜ν•΄ μ„ νƒλœ λ³€μˆ˜ 및 두 방법에 μ˜ν•΄ μ„ νƒλœ λ³€μˆ˜λ₯Ό κ²°ν•©ν•˜μ—¬ 예츑λͺ¨λΈμ„ κ°œλ³„μ μœΌλ‘œ κ΅¬μ„±ν•˜μ˜€κ³ , μ„±λŠ₯을 λΉ„κ΅ν•˜μ˜€λ‹€. λ˜ν•œ ꡬ쑰방정식 λͺ¨λΈμ„ ν™œμš©ν•œ κ²½λ‘œλΆ„μ„μ„ 톡해 SAT μ „λž΅κ³Ό HRQoL, 생쑴 κ°„μ˜ 인과관계λ₯Ό 규λͺ…ν•˜κ³ μž ν•˜μ˜€λ‹€. 연ꡬ κ²°κ³Ό: μ•”μƒμ‘΄μž 및 진행성 μ•”ν™˜μžμ˜ SAT μ „λž΅ ν΄λŸ¬μŠ€ν„°λ§μ€ 암병기에 따라 λ‹€λ₯΄κ²Œ λ‚˜νƒ€λ‚¬λ‹€. 쀑기-말기 단계 μ•” ν™˜μžλ“€μ€ 초기 단계 μ•”ν™˜μžλ“€μ— λΉ„ν•΄ μœ„κΈ°λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ μžκ°€κ΄€λ¦¬μ „λž΅μ—μ„œ 치료 μ‹œκΈ° 및 암병기에 관계없이 λͺ¨λ“  λ‹¨κ³„μ—μ„œ μ€‘μš”ν•œ 핡심 μ „λž΅μ„ μ€€λΉ„ 및 μ‹€ν–‰μ „λž΅κ³Ό ν•¨κ»˜ μ‚¬μš©ν•˜λŠ” κ²ƒμœΌλ‘œ λ‚˜νƒ€λ‚¬λ‹€. λ˜ν•œ μ΄λŸ¬ν•œ SAT μ „λž΅μ€ 진행성 μ•”ν™˜μžλ₯Ό ν¬ν•¨ν•˜μ—¬ λͺ¨λ“  μ•”ν™˜μžμ—κ²Œμ„œ κ°œμ„ λœ HRQoLκ³Ό 긍정적인 연관성을 λ³΄μ—¬μ£Όμ—ˆλ‹€. λ¨Έμ‹ λŸ¬λ‹μ„ ν™œμš©ν•œ HRQoL의 예츑 λͺ¨λΈμ€ μ•”μƒμ‘΄μžμ—μ„œ 높은 예츑 μ„±λŠ₯을 λ³΄μ—¬μ£Όμ—ˆλ‹€. κ·ΈλŸ¬λ‚˜, 각 HRQoL μš”μΈμ— λŒ€ν•œ μ€‘μš” λ³€μˆ˜λŠ” μ„œλ‘œ λ‹€λ₯΄κ²Œ λ‚˜νƒ€λ‚¬λ‹€. λ˜ν•œ λ³Έ μ—°κ΅¬λŠ” μ•”μƒμ‘΄μžλ₯Ό λŒ€μƒμœΌλ‘œ ν•œ μ›Ή 기반 μ„€λ¬Έ 쑰사 연ꡬ와 μƒˆλ‘­κ²Œ μ°Ύμ•„λ‚Έ SHAP을 ν†΅ν•œ 개인 예츑 방법을 μ ‘λͺ©ν•¨μœΌλ‘œμ¨ μ•”μƒμ‘΄μžλ₯Ό λŒ€μƒμœΌλ‘œ ν•œ 개인 λ§žμΆ€ν˜• 의료 μ„œλΉ„μŠ€ 제곡 λ°©μ•ˆμ„ ꡬ체적으둜 μ œμ‹œν•˜μ˜€λ‹€. 진행성 μ•”ν™˜μžμ—μ„œ 차원별 λ‹¨λ³€λŸ‰ Cox λͺ¨λΈμ—μ„œλŠ” ECOG μˆ˜ν–‰ μƒνƒœ, 성별, κ²°ν˜Όμƒνƒœ, μ§„λ‹¨μ‹œμ μ—μ„œμ˜ 일반적 μ‚Άμ˜ 질 μ €ν•˜, ν˜Έν‘κ³€λž€, 톡증, μ‹μš•κ°ν‡΄, λ³€λΉ„, 우울, 12μ£Ό λ™μ•ˆμ˜ μž„μƒμ μœΌλ‘œ 의미 μžˆλŠ” μ •μ„œμ  κΈ°λŠ₯ 및 μ‚¬νšŒμ  μ§€μ§€μ˜ λ³€ν™”κ°€ μ΅œμ’…μ μœΌλ‘œ 더 μ €ν•˜λœ 생쑴과 관련이 μžˆλŠ” μš”μΈμœΌλ‘œ λ‚˜νƒ€λ‚¬λ‹€. λ¨Έμ‹ λŸ¬λ‹λ°©λ²•μ„ ν™œμš©ν•œ 예츑 λͺ¨ν˜•μ—μ„œλ„ 높은 생쑴 예츑 μ„±λŠ₯이 λ‚˜νƒ€λ‚¬κ³ , BorutaSHAP을 ν†΅ν•΄μ„œλŠ” 우울, 톡증, μ‹μš•κ°ν‡΄, λ³€λΉ„, 성별이 생쑴과 μ—°κ΄€λœ μ€‘μš”ν•œ μš”μΈμœΌλ‘œ μ„ λ³„λ˜μ—ˆλ‹€. 기쑴의 전톡적 톡계방법과 λ¨Έμ‹ λŸ¬λ‹ κΈ°λ²•μœΌλ‘œ μ„ μ •λœ λ³€μˆ˜λ₯Ό κ²°ν•©ν•˜μ—¬ λͺ¨λΈμ„ κ΅¬μ„±ν•˜μ˜€μ„ λ•Œ, 생쑴 예츑 λͺ¨ν˜•μ—μ„œ κ°€μž₯ 높은 μ„±λŠ₯이 λ°œκ²¬λ˜μ—ˆλ‹€. κ²½λ‘œλΆ„μ„μ—μ„œλŠ” SATμ „λž΅, 우울, 생쑴 κ°„μ˜ 인과관계λ₯Ό λ°ν˜”μœΌλ©°, 우울 λ³€μˆ˜λ₯Ό μ™„μ „ 맀개둜 SAT μ „λž΅μ˜ 생쑴에 λŒ€ν•œ κ°„μ ‘νš¨κ³Όκ°€ μžˆλŠ” 것이 λ°œκ²¬λ˜μ—ˆλ‹€. 연ꡬ κ²°λ‘ : λ³Έ μ—°κ΅¬λŠ” 처음으둜 μ•”μƒμ‘΄μž 및 진행성 μ•”ν™˜μžλ₯Ό λͺ¨λ‘ ν¬ν•¨ν•˜μ—¬ 암병기λ₯Ό κ³ λ €ν•œ μžκ°€κ΄€λ¦¬μ „λž΅ μ‚¬μš© ꡰ집 뢄석을 μ‹œλ„ν•˜μ˜€λ‹€. λ˜ν•œ λ³Έ μ—°κ΅¬λŠ” 처음으둜 μ•”μƒμ‘΄μžμ—κ²Œ μ€‘μš”ν•œ 건강관련 μ‚Άμ˜ μ§ˆμ„ μ˜ˆμΈ‘ν•˜λŠ” λ‹¨μˆœν•œ λͺ¨λΈμ„ 개발 및 κ²€μ¦ν•˜μ˜€κ³ , μ„€λͺ… κ°€λŠ₯ν•œ 인곡지λŠ₯ μ•Œκ³ λ¦¬μ¦˜μ„ ν™œμš©ν•˜μ—¬ λͺ¨λΈμ„ ν•΄μ„ν•˜κ³ , μ•”μƒμ‘΄μžλ₯Ό μœ„ν•΄ μž„μƒν™˜κ²½μ—μ„œ λ³Έ μ—°κ΅¬μ˜ κ²°κ³Ό ν™œμš©ν•  수 μžˆλŠ” λ°©μ•ˆμ„ μ œμ•ˆν•˜μ˜€λ‹€. λ˜ν•œ λ³Έ μ—°κ΅¬μ—μ„œλŠ” λ¨Έμ‹ λŸ¬λ‹ 기법과 κ²½λ‘œλΆ„μ„μ„ μ‚¬μš©ν•˜μ—¬ 진행성 μ•”ν™˜μžμ˜ μžκ°€κ΄€λ¦¬μ „λž΅κ³Ό 건강관련 μ‚Άμ˜ 질 및 생쑴 간에 μ§Β·κ°„μ ‘μ μœΌλ‘œ 긍정적인 연관성이 μžˆμŒμ„ λ°œκ²¬ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ μ—°κ΅¬κ²°κ³ΌλŠ” μƒˆλ‘­κ²Œ κ°œλ°œν•œ SAT μžκ°€κ΄€λ¦¬μ „λž΅μ΄ μž„μƒμž₯λ©΄μ—μ„œ μ•”ν™˜μžμ—κ²Œ μœ μš©ν•œ κ°œμž… λ„κ΅¬λ‘œ μ‚¬μš©λ  수 μžˆμŒμ„ 보여쀀닀. μ’…ν•©μ μœΌλ‘œ λ³Έ μ—°κ΅¬λŠ” μ•”ν™˜μžμ˜ μžκ°€κ΄€λ¦¬μ „λž΅ μ‚¬μš© 및 κ·Έ νš¨κ³Όμ„±μ— λŒ€ν•œ μ΄ν•΄μ˜ 폭을 λ„“ν˜”κ³ , μ˜λ£Œμ œκ³΅μžκ°€ μ•” μΌ€μ–΄ μ—°μ†μ„ μƒμ—μ„œ μ•”ν™˜μžμ—κ²Œ 도움이 λ˜λŠ” 의료 μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•˜λŠ”λ° μžκ°€κ΄€λ¦¬μ „λž΅μ„ μ–΄λ–»κ²Œ ν™œμš©ν•  수 μžˆμ„μ§€ 쒅합적인 κ²°κ³Ό 및 μž„μƒμ  ν™œμš©λ°©μ•ˆμ„ μ œμ‹œν•˜μ˜€λ‹€λŠ”λ° μ˜μ˜κ°€ μžˆλ‹€.Chapter 1. Introduction 1 1.1. Study Background 1 1.2. Literature Review 7 1.3. Research Objectives and Hypothesis 16 1.4. Definition of cancer survivors and advanced cancer patients in this study 19 Chapter 2. Methods 21 2.1. Study Design 21 2.2. Study Participants 23 2.3. Measurements 25 2.4. Statistical Methods 30 Chapter 3. Results 42 3.1. Study Participantscharacteristics 42 3.2. Self-management clustering results 45 3.3. The association of self-management clustering with HRQoL 51 3.4. HRQoL prediction model development and validation 55 3.5. Survival prediction model development and validation 72 3.6. Causal relationship among SAT, HRQoL, and Survival 92 Chapter 4. Discussion 96 Chapter 5. Conclusion 104 Bibliography 105 Abstract in Korean 113 Supplementary Information 116λ°•

    Developing an individualized survival prediction model for rectal cancer

    Get PDF
    This work presents a survivability prediction model for rectal cancer patients developed through machine learning techniques. The model was based on the most complete worldwide cancer dataset known, the SEER dataset. After preprocessing, the training data consisted of 12,818 records of rectal cancer patients. Six features were extracted from a feature selection process, finding the most relevant characteristics which affect the survivability of rectal cancer. The model constructed with six features was compared with another one with 18 features indicated by a physician. The results show that the performance of the six-feature model is close to that of the model using 18 features, which indicates that the first may be a good compromise between usability and performance.FCT - Fuel Cell Technologies Program (SFRH/BD/85291/2012)info:eu-repo/semantics/publishedVersio

    Proc Mach Learn Res

    Get PDF
    Research in oncology quality of care and health outcomes has been limited by the difficulty of identifying cancer stage in health care claims data. Using linked cancer registry and Medicare claims data, we develop a tool for classifying lung cancer patients receiving chemotherapy into early vs. late stage cancer by (|) deploying ensemble machine learning for prediction, (|) establishing a set of classification rules for the predicted probabilities, and (|) considering an augmented set of administrative claims data. We find our ensemble machine learning algorithm with a classification rule defined by the median substantially outperforms an existing clinical decision tree for this problem, yielding full sample performance of 93% sensitivity, 92% specificity, and 93% accuracy. This work has the potential for broad applicability as provider organizations, payers, and policy makers seek to measure quality and outcomes of cancer care and improve on risk adjustment methods.HHSN261201000140C/CA/NCI NIH HHS/United StatesHHSN261201000035C/CA/NCI NIH HHS/United StatesT32 MH019733/MH/NIMH NIH HHS/United StatesHHSN261201000035I/CA/NCI NIH HHS/United StatesHHSN261201000034C/CA/NCI NIH HHS/United StatesU58 DP003862/DP/NCCDPHP CDC HHS/United States2018-12-10T00:00:00Z30542673PMC6287925vault:3125

    A novel integrative risk index of papillary thyroid cancer progression combining genomic alterations and clinical factors.

    Get PDF
    Although the majority of papillary thyroid cancer (PTC) is indolent, a subset of PTC behaves aggressively despite the best available treatment. A major clinical challenge is to reliably distinguish early on between those patients who need aggressive treatment from those who do not. Using a large cohort of PTC samples obtained from The Cancer Genome Atlas (TCGA), we analyzed the association between disease progression and multiple forms of genomic data, such as transcriptome, somatic mutations, and somatic copy number alterations, and found that genes related to FOXM1 signaling pathway were significantly associated with PTC progression. Integrative genomic modeling was performed, controlling for demographic and clinical characteristics, which included patient age, gender, TNM stages, histological subtypes, and history of other malignancy, using a leave-one-out elastic net model and 10-fold cross validation. For each subject, the model from the remaining subjects was used to determine the risk index, defined as a linear combination of the clinical and genomic variables from the elastic net model, and the stability of the risk index distribution was assessed through 2,000 bootstrap resampling. We developed a novel approach to combine genomic alterations and patient-related clinical factors that delineates the subset of patients who have more aggressive disease from those whose tumors are indolent and likely will require less aggressive treatment and surveillance (p = 4.62 Γ— 10-10, log-rank test). Our results suggest that risk index modeling that combines genomic alterations with current staging systems provides an opportunity for more effective anticipation of disease prognosis and therefore enhanced precision management of PTC

    Efficient Feature Selection and ML Algorithm for Accurate Diagnostics

    Get PDF
    Machine learning algorithms have been deployed in numerous optimization, prediction and classification problems. This has endeared them for application in fields such as computer networks and medical diagnosis. Although these machine learning algorithms achieve convincing results in these fields, they face numerous challenges when deployed on imbalanced dataset. Consequently, these algorithms are often biased towards majority class, hence unable to generalize the learning process. In addition, they are unable to effectively deal with high-dimensional datasets. Moreover, the utilization of conventional feature selection techniques from a dataset based on attribute significance render them ineffective for majority of the diagnosis applications. In this paper, feature selection is executed using the more effective Neighbour Components Analysis (NCA). During the classification process, an ensemble classifier comprising of K-Nearest Neighbours (KNN), Naive Bayes (NB), Decision Tree (DT) and Support Vector Machine (SVM) is built, trained and tested. Finally, cross validation is carried out to evaluate the developed ensemble model. The results shows that the proposed classifier has the best performance in terms of precision, recall, F-measure and classification accuracy

    A Comparative Study for Methodologies and Algorithms Used In Colon Cancer Diagnoses and Detection

    Get PDF
    Colon cancer is also referred to as colorectal cancer; it is a kind of cancer that starts with colon damage to the large intestine in the last section of the digestive tract. Elderly people typically suffer from colon cancer, but this may occur at any age. It normally starts as a little, noncancerous (benign) mass of cells named polyps that structure within the colon. After a period of time these polyps can turn into advanced malignant tumors that attack the human body and some of these polyps can become colon cancers. So far, no concrete causes have been identified and the complete cancer treatment is very difficult to be detected by doctors in the medical field. Colon cancer often has no symptoms in an early stage so detecting it at this stage is curable but colorectal cancer diagnosis in the final stages (stage IV), gives it the opportunity to spread into different pieces of the body, which are difficult to treat successfully, and the person\u27s opportunities of survival become much lower. False diagnosis of colorectal cancer which means wrong treatment for patients with long-term infections and they will be suffering from colon cancer this causing the death for these patients. Also, cancer treatment needs more time and a lot of money. This paper provides a comparative study for methodologies and algorithms used in the colon cancer diagnoses and detection this can help for proposing a prediction for risk levels of colon cancer disease using CNN algorithm of deep learning (Convolutional Neural Networks Algorithm)
    • …
    corecore