22 research outputs found

    Deep Recurrent Survival Analysis

    Full text link
    Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.Comment: AAAI 2019. Supplemental material, slides, code: https://github.com/rk2900/drs

    Deep Learning Causal Attributions of Breast Cancer

    Get PDF
    In this paper, a deep learning-based approach is applied to high dimensional, high-volume, and high-sparsity medical data to identify critical casual attributions that might affect the survival of a breast cancer patient. The Surveillance Epidemiology and End Results (SEER) breast cancer data is explored in this study. The SEER data set contains accumulated patient-level and treatment-level information, such as cancer site, cancer stage, treatment received, and cause of death. Restricted Boltzmann machines (RBMs) are proposed for dimensionality reduction in the analysis. RBM is a popular paradigm of deep learning networks and can be used to extract features from a given data set and transform data in a non-linear manner into a lower dimensional space for further modelling. In this study, a group of RBMs has been trained to sequentially transform the original data into a very low dimensional space, and then the k-means clustering is conducted in this space. Furthermore, the results obtained about the cluster membership of the data samples are mapped back to the original sample space for interpretation and insight creation. The analysis has demonstrated that essential features relating to breast cancer survival can be effectively extracted and brought forward into a much lower dimensional space formed by RBMs

    Clinical data analysis using artificial neural networks (ANN) and principal component analysis (PCA) of patients with breast cancer after mastectomy

    Get PDF
    BackgroundExploitation of the several types of information on patient, disease and treatment variables ranging from sociological to genetic ones by means of chemometric analysis was considered and evaluated.AimPerformance of modern data processing methods, namely principal component analysis (PCA) and artificial neural network (ANN) analysis, is demonstrated for predictions of the recurrence of breast cancer in patients treated previously with mastectomy.Materials/MethodsThe data on 718 patients were retrospectively evaluated. 11 subject and treatment variables were determined for each patient. A matrix of 718×11 data points was subjected to PCA and ANN processing. The properly trained ANN was used to predict the patients with recurrence and without recurrence within a 10-year period after mastectomy.ResultsIt was found that the prognostic potency of the trained and validated ANN was reasonably high. Additionally, using the principal component analysis (PCA) method two principal components, PC1 and PC2, were extracted from the input data. They accounted cumulatively for 37.5% of the variance of the data analyzed. An apparent clustering of the variables and patients was observed – these have been interpreted in terms of their similarities and dissimilarities.ConclusionsIt has been concluded that ANN analysis offers a promising implementation to established methods of statistical analysis of multivariable data on cancer patients. On the other hand, PCA has been recommended as an alternative to classical regression analysis of multivariable clinical data. By means of ANN and PCA practically useful systematic information may be extracted from large sets of data, which can be of value for prognosis and appropriate adjustment of the treatment of breast cancer

    e-Science and artificial neural networks in cancer management

    Get PDF
    SUMMARY We describe the origins of this project, its aims and its relevance to e-Science research. Particle physicists at the University of Manchester with experience of artificial neural networks (ANNs) have collaborated with clinicians at the University of Dundee to produce an ANN that is intended to predict survival rates and to indicate management profiles for cancer patients. Comparisons are made between typical data handling problems in particle physics and health care. The problems associated with data procurement, namely reliability and censoring are described, together with a discussion of how these problems were addressed. The inputs to the ANN and its decision output are discussed. The reliability of the ANN is assessed quantitatively. The prototype secure Web-based interface, which allows clinicians to input new patient data to the central node at the University of Manchester and to obtain prognoses from anywhere in the world is presented. For each topic, the e-Science relevance is described and underlined

    On the study of the Beran estimator for generalized censoring indicators

    Full text link
    Along with the analysis of time-to-event data, it is common to assume that only partial information is given at hand. In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also referred as the Beran estimator) is known to propose a consistent estimate for the lifetimes conditional survival function. However, a necessary condition is the clear knowledge of whether each individual is censored or not, although, this information might be incomplete or even totally absent in practice. We thus propose a study on the Beran estimator when the censoring indicator is not clearly specified. From this, we provide a new estimator for the conditional survival function and establish its asymptotic normality under mild conditions. We further study the supervised learning problem where the conditional survival function is to be predicted with no censorship indicators. To this aim, we investigate various approaches estimating the conditional expectation for the censoring indicator. Along with the theoretical results, we illustrate how the estimators work for small samples by means of a simulation study and show their practical applicability with the analysis of synthetic data and the study of real data for the prognosis of monoclonal gammopathy

    A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations

    Full text link
    BACKGROUND: This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects. The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools. RESULTS: Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82 % correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set). CONCLUSIONS: IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0722-x) contains supplementary material, which is available to authorized users
    corecore