317 research outputs found
Study of microRNAs-21/221 as potential breast cancer biomarkers in Egyptian women
microRNAs (miRNAs) play an important role in cancer prognosis. They are small molecules, approximately 17–25 nucleotides in length, and their high stability in human serum supports their use as novel diagnostic biomarkers of cancer and other pathological conditions. In this study, we analyzed the expression patterns of miR-21 and miR-221 in the serum from a total of 100 Egyptian female subjects with breast cancer, fibroadenoma, and healthy control subjects. Using microarray-based expression profiling followed by real-time polymerase chain reaction validation, we compared the levels of the two circulating miRNAs in the serum of patients with breast cancer (n = 50), fibroadenoma (n = 25), and healthy controls (n = 25). The miRNA SNORD68 was chosen as the housekeeping endogenous control. We found that the serum levels of miR-21 and miR-221 were significantly overexpressed in breast cancer patients compared to normal controls and fibroadenoma patients. Receiver Operating Characteristic (ROC) curve analysis revealed that miR-21 has greater potential in discriminating between breast cancer patients and the control group, while miR-221 has greater potential in discriminating between breast cancer and fibroadenoma patients. Classification models using k-Nearest Neighbor (kNN), Naïve Bayes (NB), and Random Forests (RF) were developed using expression levels of both miR-21 and miR-221. Best classification performance was achieved by NB Classification models, reaching 91% of correct classification. Furthermore, relative miR-221 expression was associated with histological tumor grades. Therefore, it may be concluded that both miR-21 and miR-221 can be used to differentiate between breast cancer patients and healthy controls, but that the diagnostic accuracy of serum miR-21 is superior to miR-221 for breast cancer prediction. miR-221 has more diagnostic power in discriminating between breast cancer and fibroadenoma patients. The overexpression of miR-221 has been associated with the breast cancer grade. We also demonstrated that the combined expression of miR-21 and miR-221can be successfully applied as breast cancer biomarkers
Towards Prediction of Pancreatic Cancer Using SVM Study Model
published_or_final_versio
Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers
Background and Objective: Colorectal cancer is a high mortality cancer.
Clinical data analysis plays a crucial role in predicting the survival of
colorectal cancer patients, enabling clinicians to make informed treatment
decisions. However, utilizing clinical data can be challenging, especially when
dealing with imbalanced outcomes. This paper focuses on developing algorithms
to predict 1-, 3-, and 5-year survival of colorectal cancer patients using
clinical datasets, with particular emphasis on the highly imbalanced 1-year
survival prediction task. To address this issue, we propose a method that
creates a pipeline of some of standard balancing techniques to increase the
true positive rate. Evaluation is conducted on a colorectal cancer dataset from
the SEER database. Methods: The pre-processing step consists of removing
records with missing values and merging categories. The minority class of
1-year and 3-year survival tasks consists of 10% and 20% of the data,
respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN),
Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and
RENN approaches were used and compared for balancing the data with tree-based
classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient
Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method.
Results: The performance evaluation utilizes a 5-fold cross-validation
approach. In the case of highly imbalanced datasets (1-year), our proposed
method with LGBM outperforms other sampling methods with the sensitivity of
72.30%. For the task of imbalance (3-year survival), the combination of RENN
and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method
works best for highly imbalanced datasets. Conclusions: Our proposed method
significantly improves mortality prediction for the minority class of
colorectal cancer patients.Comment: 19 Pages, 6 Figures, 4 Table
Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method
BACKGROUND: Recent advances in genome technologies have provided an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis and selection of treatment. To accomplish this objective, it is important to establish a sophisticated algorithm that can deal with large quantities of data such as gene expression profiles obtained by DNA microarray analysis. RESULTS: Previously, we developed the projective adaptive resonance theory (PART) filtering method as a gene filtering method. This is one of the clustering methods that can select specific genes for each subtype. In this study, we applied the PART filtering method to analyze microarray data that were obtained from soft tissue sarcoma (STS) patients for the extraction of subtype-specific genes. The performance of the filtering method was evaluated by comparison with other widely used methods, such as signal-to-noise, significance analysis of microarrays, and nearest shrunken centroids. In addition, various combinations of filtering and modeling methods were used to extract essential subtype-specific genes. The combination of the PART filtering method and boosting – the PART-BFCS method – showed the highest accuracy. Seven genes among the 15 genes that are frequently selected by this method – MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 – are known prognostic marker genes for other tumors. These genes are candidate marker genes for the diagnosis of STS. Correlation analysis was performed to extract marker genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and they could be candidate marker genes for the diagnosis of STS. CONCLUSION: The procedure that consisted of two steps, such as the PART-BFCS and the correlation analysis, was proposed. The results suggest that novel diagnostic and therapeutic targets for STS can be extracted by a procedure that includes the PART filtering method
Machine learning in oral squamous cell carcinoma: current status, clinical concerns and prospects for future-A systematic review
Background: Oral cancer can show heterogenous patterns of behavior. For proper and effective management of oral cancer, early diagnosis and accurate prediction of prognosis are important. To achieve this, artificial intelligence (AI) or its subfield, machine learning, has been touted for its potential to revolutionize cancer management through improved diagnostic precision and prediction of outcomes. Yet, to date, it has made only few contributions to actual medical practice or patient care. Objectives: This study provides a systematic review of diagnostic and prognostic application of machine learning in oral squamous cell carcinoma (OSCC) and also highlights some of the limitations and concerns of clinicians towards the implementation of machine learning-based models for daily clinical practice. Data sources: We searched OvidMedline, PubMed, Scopus, Web of Science, and Institute of Electrical and Electronics Engineers (IEEE) databases from inception until February 2020 for articles that used machine learning for diagnostic or prognostic purposes of OSCC. Eligibility criteria: Only original studies that examined the application of machine learning models for prognostic and/or diagnostic purposes were considered. Data extraction: Independent extraction of articles was done by two researchers (A.R. & O.Y) using predefine study selection criteria. We used the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) in the searching and screening processes. We also used Prediction model Risk of Bias Assessment Tool (PROBAST) for assessing the risk of bias (ROB) and quality of included studies. Results: A total of 41 studies were published to have used machine learning to aid in the diagnosis/or prognosis of OSCC. The majority of these studies used the support vector machine (SVM) and artificial neural network (ANN) algorithms as machine learning techniques. Their specificity ranged from 0.57 to 1.00, sensitivity from 0.70 to 1.00, and accuracy from 63.4 % to 100.0 % in these studies. The main limitations and concerns can be grouped as either the challenges inherent to the science of machine learning or relating to the clinical implementations. Conclusion: Machine learning models have been reported to show promising performances for diagnostic and prognostic analyses in studies of oral cancer. These models should be developed to further enhance explainability, interpretability, and externally validated for generalizability in order to be safely integrated into daily clinical practices. Also, regulatory frameworks for the adoption of these models in clinical practices are necessary.Peer reviewe
Recommended from our members
Quantitative Analysis of Immune Infiltrates in Primary Melanoma.
Novel methods to analyze the tumor microenvironment (TME) are urgently needed to stratify melanoma patients for adjuvant immunotherapy. Tumor-infiltrating lymphocyte (TIL) analysis, by conventional pathologic methods, is predictive but is insufficiently precise for clinical application. Quantitative multiplex immunofluorescence (qmIF) allows for evaluation of the TME using multiparameter phenotyping, tissue segmentation, and quantitative spatial analysis (qSA). Given that CD3+CD8+ cytotoxic lymphocytes (CTLs) promote antitumor immunity, whereas CD68+ macrophages impair immunity, we hypothesized that quantification and spatial analysis of macrophages and CTLs would correlate with clinical outcome. We applied qmIF to 104 primary stage II to III melanoma tumors and found that CTLs were closer in proximity to activated (CD68+HLA-DR+) macrophages than nonactivated (CD68+HLA-DR-) macrophages (P < 0.0001). CTLs were further in proximity from proliferating SOX10+ melanoma cells than nonproliferating ones (P < 0.0001). In 64 patients with known cause of death, we found that high CTL and low macrophage density in the stroma (P = 0.0038 and P = 0.0006, respectively) correlated with disease-specific survival (DSS), but the correlation was less significant for CTL and macrophage density in the tumor (P = 0.0147 and P = 0.0426, respectively). DSS correlation was strongest for stromal HLA-DR+ CTLs (P = 0.0005). CTL distance to HLA-DR- macrophages associated with poor DSS (P = 0.0016), whereas distance to Ki67- tumor cells associated inversely with DSS (P = 0.0006). A low CTL/macrophage ratio in the stroma conferred a hazard ratio (HR) of 3.719 for death from melanoma and correlated with shortened overall survival (OS) in the complete 104 patient cohort by Cox analysis (P = 0.009) and merits further development as a biomarker for clinical application
Software as a Service (SaaS) based Machine Learning for Digital Image Recognition
Nowadays, the machine learning method and algorithms are varied with different capabilities and tasks. It is almost impossible to understand the algorithm in detail or to determine which method is appropriate for certain applications. For these reasons, the application system operates in cloud system according to Software as a Service (SaaS) method is proposed; therefore the system is accessible for multiusers with open source data mining. Wide range of algorithms in Waikato Environment for Knowledge Analysis (WEKA) machine learning are considered, for instance support vector machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes, C4.5 Decision Tree, Logistic Regression and Random Forest methods. The application facilitates the image recognition researcher of using SaaS method, due to the flexibility in purpose of research, such as search in algorithm analysis, optimal training results in digital image recognition and the implementation of application system. In addition, the system application can be accessed anytime without installation process, but through web browsing systems
A machine learning-based model for predicting distant metastasis in patients with rectal cancer
BackgroundDistant metastasis from rectal cancer usually results in poorer survival and quality of life, so early identification of patients at high risk of distant metastasis from rectal cancer is essential.MethodThe study used eight machine-learning algorithms to construct a machine-learning model for the risk of distant metastasis from rectal cancer. We developed the models using 23867 patients with rectal cancer from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2017. Meanwhile, 1178 rectal cancer patients from Chinese hospitals were selected to validate the model performance and extrapolation. We tuned the hyperparameters by random search and tenfold cross-validation to construct the machine-learning models. We evaluated the models using the area under the receiver operating characteristic curves (AUC), the area under the precision-recall curve (AUPRC), decision curve analysis, calibration curves, and the precision and accuracy of the internal test set and external validation cohorts. In addition, Shapley’s Additive explanations (SHAP) were used to interpret the machine-learning models. Finally, the best model was applied to develop a web calculator for predicting the risk of distant metastasis in rectal cancer.ResultThe study included 23,867 rectal cancer patients and 2,840 patients with distant metastasis. Multiple logistic regression analysis showed that age, differentiation grade, T-stage, N-stage, preoperative carcinoembryonic antigen (CEA), tumor deposits, perineural invasion, tumor size, radiation, and chemotherapy were-independent risk factors for distant metastasis in rectal cancer. The mean AUC value of the extreme gradient boosting (XGB) model in ten-fold cross-validation in the training set was 0.859. The XGB model performed best in the internal test set and external validation set. The XGB model in the internal test set had an AUC was 0.855, AUPRC was 0.510, accuracy was 0.900, and precision was 0.880. The metric AUC for the external validation set of the XGB model was 0.814, AUPRC was 0.609, accuracy was 0.800, and precision was 0.810. Finally, we constructed a web calculator using the XGB model for distant metastasis of rectal cancer.ConclusionThe study developed and validated an XGB model based on clinicopathological information for predicting the risk of distant metastasis in patients with rectal cancer, which may help physicians make clinical decisions. rectal cancer, distant metastasis, web calculator, machine learning algorithm, external validatio
From Correlation to Causality: Does Network Information improve Cancer Outcome Prediction?
Motivation:
Disease progression in cancer can vary substantially between patients. Yet, patients often receive the same treatment. Recently, there has been much work on predicting disease progression and patient outcome variables from gene expression in order to personalize treatment options. A widely used approach is high-throughput experiments that aim to explore predictive signature genes which would provide identification of clinical outcome of diseases. Microarray data analysis helps to reveal underlying biological mechanisms of tumor progression, metastasis, and drug-resistance in cancer studies. Despite first diagnostic kits in the market, there are open problems such as the choice of random gene signatures or noisy expression data. The experimental or computational noise in data and limited tissue samples collected from patients might furthermore reduce the predictive power and biological interpretability of such signature genes. Nevertheless, signature genes predicted by different studies generally represent poor similarity; even for the same type of cancer.
Integration of network information with gene expression data could provide more efficient signatures for outcome prediction in cancer studies. One approach to deal with these problems employs gene-gene relationships and ranks genes using the random surfer model of Google's PageRank algorithm. Unfortunately, the majority of published network-based approaches solely tested their methods on a small amount of datasets, questioning the general applicability of network-based methods for outcome prediction.
Methods:
In this thesis, I provide a comprehensive and systematically evaluation of a network-based outcome prediction approach -- NetRank - a PageRank derivative -- applied on several types of gene expression cancer data and four different types of networks. The algorithm identifies a signature gene set for a specific cancer type by incorporating gene network information with given expression data. To assess the performance of NetRank, I created a benchmark dataset collection comprising 25 cancer outcome prediction datasets from literature and one in-house dataset.
Results:
NetRank performs significantly better than classical methods such as foldchange or t-test as it improves the prediction performance in average for 7%. Besides, we are approaching the accuracy level of the authors' signatures by applying a relatively unbiased but fully automated process for biomarker discovery. Despite an order of magnitude difference in network size, a regulatory, a protein-protein interaction and two predicted networks perform equally well.
Signatures as published by the authors and the signatures generated with classical methods do not overlap -- not even for the same cancer type -- whereas the network-based signatures strongly overlap. I analyze and discuss these overlapping genes in terms of the Hallmarks of cancer and in particular single out six transcription factors and seven proteins and discuss their specific role in cancer progression. Furthermore several tests are conducted for the identification of a Universal Cancer Signature. No Universal Cancer Signature could be identified so far, but a cancer-specific combination of general master regulators with specific cancer genes could be discovered that achieves the best results for all cancer types.
As NetRank offers a great value for cancer outcome prediction, first steps for a secure usage of NetRank in a public cloud are described.
Conclusion:
Experimental evaluation of network-based methods on a gene expression benchmark dataset suggests that these methods are especially suited for outcome prediction as they overcome the problems of random gene signatures and noisy expression data. Through the combination of network information with gene expression data, network-based methods identify highly similar signatures over all cancer types, in contrast to classical methods that fail to identify highly common gene sets across the same cancer types.
In general allows the integration of additional information in gene expression analysis the identification of more reliable, accurate and reproducible biomarkers and provides a deeper understanding of processes occurring in cancer development and progression.:1 Definition of Open Problems
2 Introduction
2.1 Problems in cancer outcome prediction
2.2 Network-based cancer outcome prediction
2.3 Universal Cancer Signature
3 Methods
3.1 NetRank algorithm
3.2 Preprocessing and filtering of the microarray data
3.3 Accuracy
3.4 Signature similarity
3.5 Classical approaches
3.6 Random signatures
3.7 Networks
3.8 Direct neighbor method
3.9 Dataset extraction
4 Performance of NetRank
4.1 Benchmark dataset for evaluation
4.2 The influence of NetRank parameters
4.3 Evaluation of NetRank
4.4 General findings
4.5 Computational complexity of NetRank
4.6 Discussion
5 Universal Cancer Signature
5.1 Signature overlap – a sign for Universal Cancer Signature
5.2 NetRank genes are highly connected and confirmed in literature
5.3 Hallmarks of Cancer
5.4 Testing possible Universal Cancer Signatures
5.5 Conclusion
6 Cloud-based Biomarker Discovery
6.1 Introduction to secure Cloud computing
6.2 Cancer outcome prediction
6.3 Security analysis
6.4 Conclusion
7 Contributions and Conclusion
- …