6,849 research outputs found

    DACH1: its role as a classifier of long term good prognosis in luminal breast cancer

    Get PDF
    Background: Oestrogen receptor (ER) positive (luminal) tumours account for the largest proportion of females with breast cancer. Theirs is a heterogeneous disease presenting clinical challenges in managing their treatment. Three main biological luminal groups have been identified but clinically these can be distilled into two prognostic groups in which Luminal A are accorded good prognosis and Luminal B correlate with poor prognosis. Further biomarkers are needed to attain classification consensus. Machine learning approaches like Artificial Neural Networks (ANNs) have been used for classification and identification of biomarkers in breast cancer using high throughput data. In this study, we have used an artificial neural network (ANN) approach to identify DACH1 as a candidate luminal marker and its role in predicting clinical outcome in breast cancer is assessed. Materials and methods: A reiterative ANN approach incorporating a network inferencing algorithm was used to identify ER- associated biomarkers in a publically available cDNA microarray dataset. DACH1 was identified in having a strong influence on ER associated markers and a positive association with ER. Its clinical relevance in predicting breast cancer specific survival was investigated by statistically assessing protein expression levels after immunohistochemistry in a series of unselected breast cancers, formatted as a tissue microarray. Results: Strong nuclear DACH1 staining is more prevalent in tubular and lobular breast cancer. Its expression correlated with ER-alpha positive tumours expressing PgR, epithelial cytokeratins (CK)18/19 and 'luminal-like' markers of good prognosis including FOXA1 and RERG (p , 0.05). DACH1 is increased in patients showing longer cancer specific survival and disease free interval and reduced metastasis formation (p , 0.001). Nuclear DACH1 showed a negative association with markers of aggressive growth and poor prognosis. Conclusion: Nuclear DACH1 expression appears to be a Luminal A biomarker predictive of good prognosis, but is not independent of clinical stage, tumour size, NPI status or systemic therapy

    NELFE-Dependent MYC Signature Identifies a Unique Cancer Subtype in Hepatocellular Carcinoma.

    Get PDF
    The MYC oncogene is dysregulated in approximately 30% of liver cancer. In an effort to exploit MYC as a therapeutic target, including in hepatocellular carcinoma (HCC), strategies have been developed on the basis of MYC amplification or gene translocation. Due to the failure of these strategies to provide accurate diagnostics and prognostic value, we have developed a Negative Elongation Factor E (NELFE)-Dependent MYC Target (NDMT) gene signature. This signature, which consists of genes regulated by MYC and NELFE, an RNA binding protein that enhances MYC-induced hepatocarcinogenesis, is predictive of NELFE/MYC-driven tumors that would otherwise not be identified by gene amplification or translocation alone. We demonstrate the utility of the NDMT gene signature to predict a unique subtype of HCC, which is associated with a poor prognosis in three independent cohorts encompassing diverse etiologies, demographics, and viral status. The application of gene signatures, such as the NDMT signature, offers patients access to personalized risk assessments, which may be utilized to direct future care

    Towards generalizable machine learning models for computer-aided diagnosis in medicine

    Get PDF
    Hidden stratification represents a phenomenon in which a training dataset contains unlabeled (hidden) subsets of cases that may affect machine learning model performance. Machine learning models that ignore the hidden stratification phenomenon--despite promising overall performance measured as accuracy and sensitivity--often fail at predicting the low prevalence cases, but those cases remain important. In the medical domain, patients with diseases are often less common than healthy patients, and a misdiagnosis of a patient with a disease can have significant clinical impacts. Therefore, to build a robust and trustworthy CAD system and a reliable treatment effect prediction model, we cannot only pursue machine learning models with high overall accuracy, but we also need to discover any hidden stratification in the data and evaluate the proposing machine learning models with respect to both overall performance and the performance on certain subsets (groups) of the data, such as the ‘worst group’. In this study, I investigated three approaches for data stratification: a novel algorithmic deep learning (DL) approach that learns similarities among cases and two schema completion approaches that utilize domain expert knowledge. I further proposed an innovative way to integrate the discovered latent groups into the loss functions of DL models to allow for better model generalizability under the domain shift scenario caused by the data heterogeneity. My results on lung nodule Computed Tomography (CT) images and breast cancer histopathology images demonstrate that learning homogeneous groups within heterogeneous data significantly improves the performance of the computer-aided diagnosis (CAD) system, particularly for low-prevalence or worst-performing cases. This study emphasizes the importance of discovering and learning the latent stratification within the data, as it is a critical step towards building ML models that are generalizable and reliable. Ultimately, this discovery can have a profound impact on clinical decision-making, particularly for low-prevalence cases

    Breast cancer data analysis for survivability studies and prediction

    Full text link
    © 2017 Elsevier B.V. Background Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. Objective The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. Methods Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. Results SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. Conclusions A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients

    Supervised Machine Learning Model for Microrna Expression Data in Cancer

    Get PDF
    The cancer cell gene expression data in general has a very large feature and requires analysis to find out which genes are strongly influencing the specific disease for diagnosis and drug discovery. In this paper several methods of supervised learning (decisien tree, naïve bayes, neural network, and deep learning) are used to classify cancer cells based on the expression of the microRNA gene to obtain the best method that can be used for gene analysis. In this study there is no optimization and tuning of the algorithm to test the ability of general algorithms. There are 1881 features of microRNA gene epresi on 25 cancer classes based on tissue location. A simple feature selection method is used to test the comparison of the algorithm. Expreriments were conducted with various scenarios to test the accuracy of the classification

    Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

    Full text link
    Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.Comment: C.F. and R.M. share senior authorshi

    Artificial intelligence for breast cancer precision pathology

    Get PDF
    Breast cancer is the most common cancer type in women globally but is associated with a continuous decline in mortality rates. The improved prognosis can be partially attributed to effective treatments developed for subgroups of patients. However, nowadays, it remains challenging to optimise treatment plans for each individual. To improve disease outcome and to decrease the burden associated with unnecessary treatment and adverse drug effects, the current thesis aimed to develop artificial intelligence based tools to improve individualised medicine for breast cancer patients. In study I, we developed a deep learning based model (DeepGrade) to stratify patients that were associated with intermediate risks. The model was optimised with haematoxylin and eosin (HE) stained whole slide images (WSIs) with grade 1 and 3 tumours and applied to stratify grade 2 tumours into grade 1-like (DG2-low) and grade 3-like (DG2-high) subgroups. The efficacy of the DeepGrade model was validated using recurrence free survival where the dichotomised groups exhibited an adjusted hazard ratio (HR) of 2.94 (95% confidence interval [CI] 1.24-6.97, P = 0.015). The observation was further confirmed in the external test cohort with an adjusted HR of 1.91 (95% CI: 1.11-3.29, P = 0.019). In study II, we investigated whether deep learning models were capable of predicting gene expression levels using the morphological patterns from tumours. We optimised convolutional neural networks (CNNs) to predict mRNA expression for 17,695 genes using HE stained WSIs from the training set. An initial evaluation on the validation set showed that a significant correlation between the RNA-seq measurements and model predictions was observed for 52.75% of the genes. The models were further tested in the internal and external test sets. Besides, we compared the model's efficacy in predicting RNA-seq based proliferation scores. Lastly, the ability of capturing spatial gene expression variations for the optimised CNNs was evaluated and confirmed using spatial transcriptomics profiling. In study III, we investigated the relationship between intra-tumour gene expression heterogeneity and patient survival outcomes. Deep learning models optimised from study II were applied to generate spatial gene expression predictions for the PAM50 gene panel. A set of 11 texture based features and one slide average gene expression feature per gene were extracted as input to train a Cox proportional hazards regression model with elastic net regularisation to predict patient risk of recurrence. Through nested cross-validation, the model dichotomised the training cohort into low and high risk groups with an adjusted HR of 2.1 (95% CI: 1.30-3.30, P = 0.002). The model was further validated on two external cohorts. In study IV, we investigated the agreement between the Stratipath Breast, which is the modified, commercialised DeepGrade model developed in study I, and the Prosigna® test. Both tests sought to stratify patients with distinct prognosis. The outputs from Stratipath Breast comprise a risk score and a two-level risk stratification whereas the outputs from Prosigna® include the risk of recurrence score and a three-tier risk stratification. By comparing the number of patients assigned to ‘low’ or ‘high’ risk groups, we found an overall moderate agreement (76.09%) between the two tests. Besides, the risk scores by two tests also revealed a good correlation (Spearman's rho = 0.59, P = 1.16E-08). In addition, a good correlation was observed between the risk score from each test and the Ki67 index. The comparison was also carried out in the subgroup of patients with grade 2 tumours where similar but slightly dropped correlations were found
    • …
    corecore