19,255 research outputs found

    Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms

    Get PDF
    Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank p-value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC

    Deconvolute brain tumor genomic alterations based on DNA methylation

    Get PDF
    Molecular classification based on mutations, expression subtypes, and copy number variants has improved diagnosis and treatment decision-making for patients with brain tumors, particularly malignant gliomas. However, the association between epigenetic signature and genetic alterations is poorly understood. For example, mutation of isocitrate dehydrogenase (IDH) is associated with genome-wide hypermethylation of CpG islands in gliomas. But other subtype-associated alterations, including telomerase reverse transcriptase (TERT) promoter mutation, alpha thalassemia/mental retardation syndrome X-linked (ATRX) mutation, chromosome 1p19q co-deletion (chr1p19q codel), and gene expression subtypes, have yet to be associated with any epigenetic signature. Therefore, we hypothesized that DNA methylation signatures can classify gliomas based on these alterations and give insight into subgroup characteristics. Machine learning models, including elastic net and random forest, were used to predict somatic mutations of IDH, TERTp, and ATRX, chr1p19q codel, and gene expression subtype of gliomas. Data from the NOA-04 randomized phase III trial were used for external validation. In total, 926 cases from The Cancer Genome Atlas were included in this study. Prediction accuracies for IDH, TERTp, and ATRX mutations, and chr1p19q codel were 100%, 98.3%, 90.48%, and 99.21%, respectively in test set. Accuracy for gene expression subtype prediction was 72.2%. The methylation-based prediction models for both ATRX and chr1p19q codel statuses proved superior to conventional assays for these biomarkers. Similarly, characteristic alterations associated with gene expression subtypes were better discriminated using methylation compared to transcriptome-based classification. DNA methylation signatures accurately predicted somatic alterations and improved over existing classifiers. The established Unified Diagnostic Pipeline (UniD) is a rapid and cost-effective diagnostic platform of genomic alterations and gene expression subtypes at initial clinical diagnosis and improves over individual assays currently in clinical use. The significant relationship between genetic alterations and epigenetic signatures indicates the broad applicability of our approach to other malignancies

    Multi-test Decision Tree and its Application to Microarray Data Classification

    Get PDF
    Objective: The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. Methods: We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Results: Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 1414 datasets by an average 66 percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. Conclusion: This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts
    • …
    corecore