2,025 research outputs found

    Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes

    Get PDF
    >Magister Scientiae - MScINTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests

    An Investigation Of Gene Networks Influenced By Low Dose Ionizing Radiation Using Statistical And Graph Theoretical Algorithms

    Get PDF
    Increased application of radiation in health and security sectors has raised concerns about its deleterious effects. Ionizing radiation (IR) less than 10cGys is considered low dose ionizing radiation (LDIR) by the National Research Committee to assess health risks from exposure to low levels of IR. It is hard to extract the effects of mild stimulus such as LDIR on gene expression profiles using simple differential expression. We hypothesized that differential correlation instead would capture the effects of LDIR on mutual relationships between genes. We tested this hypothesis on expression profiles from five inbred strains of mice treated with LDIR. Whereas ANOVA detected little effect of LDIR on gene expression, a differential correlation graph generated by a two stage statistical filter revealed gene networks enriched with genes implicated in radiation response, DNA damage repair, apoptosis, cancer and immune system. To mimic the effects of radiation on human populations, we profiled baseline expression of recombinant inbred strains of BXD mice derived from a cross between C57BL/6J and DBA/2J standard inbred strains. To establish a threshold for extraction of gene networks from the baseline expression profiles, we compared gene enrichment in paracliques obtained at different absolute Pearson correlations (APC) using graph algorithms. Gene networks extracted at statistically significant APC (r≈0.41) exhibited even better enrichment of genes participating in common biological processes than networks extracted at higher APCs from 0.6 to 0.875. Since immune response is influenced by LDIR, we investigated the effects of genetic background on variability of immune system in a population of BXD mice. Considering immune response as a complex trait, we identified significant QTLs explaining the ratio of CD8+ and CD4+ T-cells. Multiple regression modeling of genes neighboring statistically significant QTLs identified three candidate genes (Ptprk,Acp1 and Lamb1-1) explaining 61% variance of ratio of CD4+ and CD8+ T cells. Expression profiling of parental strains of BXD mice also revealed effects of LDIR and LDIR*strain on expression of genes related to immune response. Thus using an integrated approach involving transcriptomic, SNP and immunological data, we have developed novel methods to pinpoint candidate gene networks putatively influenced by LDIR

    Role of deregulated microRNAs in breast cancer progression Using FFPE tissue

    Get PDF
    MicroRNAs (miRNAs) contribute to cancer initiation and progression by silencing the expression of their target genes, causing either mRNA molecule degradation or translational inhibition. Intraductal epithelial proliferations of the breast are histologically and clinically classified into normal, atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). To better understand the progression of ductal breast cancer development, we attempt to identify deregulated miRNAs in this process using Formalin-Fixed, Paraffin-Embedded (FFPE) tissues from breast cancer patients. Following tissue microdissection, we obtained 8 normal, 4 ADH, 6 DCIS and 7 IDC samples, which were subject to RNA isolation and miRNA expression profiling analysis. We found that miR-21, miR-200b/c, miR-141, and miR-183 were consistently up-regulated in ADH, DCIS and IDC compared to normal, while miR-557 was uniquely down-regulated in DCIS. Interestingly, the most significant miRNA deregulations occurred during the transition from normal to ADH. However, the data did not reveal a step-wise miRNA alteration among discrete steps along tumor progression, which is in accordance with previous reports of mRNA profiling of different stages of breast cancer. Furthermore, the expression of MSH2 and SMAD7, two important molecules involving TGF-β pathway, was restored following miR-21 knockdown in both MCF-7 and Hs578T breast cancer cells. In this study, we have not only identified a number of potential candidate miRNAs for breast cancer, but also found that deregulation of miRNA expression during breast tumorigenesis might be an early event since it occurred significantly during normal to ADH transition. Consequently, we have demonstrated the feasibility of miRNA expression profiling analysis using archived FFPE tissues, typically with rich clinical information, as a means of miRNA biomarker discovery

    Independent Weighted Feature Set with Linked Feature Reduction Model for Lung Cancer Stage Detection using Machine Learning Model

    Get PDF
    Lung cancer is a potentially fatal disease that is affected to 18% of population every year. Finding the exact location of a cancer and identification of lung cancer stage continues to be difficult for medical professionals. The true reason for cancer and a comprehensive cure is still unknown. Treatment for cancer is possible if detected at an early stage with accurate stage detection. Finding areas of the lung that have been impacted by cancer requires the use of image processing techniques like noise reduction, highlight filtration, recognizable proof of effected lung regions, and perhaps a comparison with data on the curative history of lung cancer. This research investigates whether or not technology enabled by machine learning algorithms and image processing can correctly classifies and predict lung cancer. For images, the dimensional feature channel is used in the preliminary processing stage. The proposed model considers Magnetic Resonance Imaging (MRI) images for detection of lung cancer. This research proposes an Independent Weighted Feature Set with Linked Feature Reduction (IWFS-LFR) model for accurate lung cancer stage detection based on the size of the tumour. The tumour stage can be accurately predicted using the feature attribute similarity calculation for accurate detection of lung cancer stage for proper diagnosis. The proposed model when contrasted with the traditional model exhibits better performance

    KIR+ CD8+ T Lymphocytes in Cancer Immunosurveillance and Patient Survival: Gene Expression Profiling

    Get PDF
    Killer-cell immunoglobulin-like receptors (KIR) are molecules expressed by the most important cells of the immune system for cancer immune vigilance, natural killer (NK) and effector T cells. In this manuscript we study the role that cytotoxic CD8+ T cells expressing KIR receptors could play in cancer immune surveillance. With this objective, frequencies of different KIR+ CD8+ T cell subsets are correlated with the overall survival of patients with melanoma, ovarian and bladder carcinomas. In addition, the gene expression profile of KIR+ CD8+ T cell subsets related to the survival of patients is studied with the aim of discovering new therapeutic targets, so that the outcome of patients with cancer can be improved. Killer-cell immunoglobulin-like receptors (KIR) are expressed by natural killer (NK) and effector T cells. Although KIR+ T cells accumulate in oncologic patients, their role in cancer immune response remains elusive. This study explored the role of KIR+CD8+ T cells in cancer immunosurveillance by analyzing their frequency at diagnosis in the blood of 249 patients (80 melanomas, 80 bladder cancers, and 89 ovarian cancers), their relationship with overall survival (OS) of patients, and their gene expression profiles. KIR2DL1+ CD8+ T cells expanded in the presence of HLA-C2-ligands in patients who survived, but it did not in patients who died. In contrast, presence of HLA-C1-ligands was associated with dose-dependent expansions of KIR2DL2/S2+ CD8+ T cells and with shorter OS. KIR interactions with their specific ligands profoundly impacted CD8+ T cell expression profiles, involving multiple signaling pathways, effector functions, the secretome, and consequently, the cellular microenvironment, which could impact their cancer immunosurveillance capacities. KIR2DL1/S1+ CD8+ T cells showed a gene expression signature related to efficient tumor immunosurveillance, whereas KIR2DL2/L3/S2+CD8+ T cells showed transcriptomic profiles related to suppressive anti-tumor responses. These results could be the basis for the discovery of new therapeutic targets so that the outcome of patients with cancer can be improved

    Developing statistical and bioinformatic analysis of genomic data from tumours

    Get PDF
    Previous prognostic signatures for melanoma based on tumour transcriptomic data were developed predominantly on cohorts of AJCC (American Joint Committee on Cancer) stages III and IV melanoma. Since 92% of melanoma patients are diagnosed at AJCC stages I and II, there is an urgent need for better prognostic biomarkers to allow patient stratification for receiving early adjuvant therapies. This study uses genome-wide tumour gene expression levels and clinico-histopathological characteristics of patients from the Leeds Melanoma Cohort (LMC). Several unsupervised and supervised classification approaches were applied to the transcriptomic data, to identify biological classes of melanoma, and to develop prognostic classification models respectively. Unsupervised clustering identified six biologically distinct primary melanoma classes (LMC classes). Unlike previous molecular classes of melanoma, the LMC classes were prognostic in both the whole LMC dataset and in stage I tumours. The prognostic value of the LMC classes was replicated in an independent dataset, but insufficient data were available to replicate in an AJCC stage I subset. Supervised classification using the Random Forest (RF) approach provided improved performances when adjustments were made to deal with class imbalance, while this did not improve performance of the Support Vector Machine (SVM). However, RF and SVM had similar results overall, with RF only marginally better. Combining clinical and transcriptomic information in the RF further improved the performance of the prediction model in comparison to using clinical information alone. Finally, the agnostically derived LMC classes and the supervised RF model showed convergence in their association with outcome in some groups of patients, but not in others. In conclusion, this study reports six molecular classes of primary melanoma with prognostic value in stage I disease and overall, and a prognostic classification model that predicts outcome in primary melanoma

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patients’ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness

    The consensus molecular subtypes of colorectal cancer

    Get PDF
    Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-beta activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions

    An Optimal Time for Treatment-Predicting Circadian Time by Machine Learning and Mathematical Modelling

    Get PDF
    Tailoring medical interventions to a particular patient and pathology has been termed personalized medicine. The outcome of cancer treatments is improved when the intervention is timed in accordance with the patient's internal time. Yet, one challenge of personalized medicine is how to consider the biological time of the patient. Prerequisite for this so-called chronotherapy is an accurate characterization of the internal circadian time of the patient. As an alternative to time-consuming measurements in a sleep-laboratory, recent studies in chronobiology predict circadian time by applying machine learning approaches and mathematical modelling to easier accessible observables such as gene expression. Embedding these results into the mathematical dynamics between clock and cancer in mammals, we review the precision of predictions and the potential usage with respect to cancer treatment and discuss whether the patient's internal time and circadian observables, may provide an additional indication for individualized treatment timing. Besides the health improvement, timing treatment may imply financial advantages, by ameliorating side effects of treatments, thus reducing costs. Summarizing the advances of recent years, this review brings together the current clinical standard for measuring biological time, the general assessment of circadian rhythmicity, the usage of rhythmic variables to predict biological time and models of circadian rhythmicity
    corecore