45 research outputs found

    Prognostic modelling of breast cancer patients: a benchmark of predictive models with external validation

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores – Sistemas Digitais e Percepcionais pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThere are several clinical prognostic models in the medical field. Prior to clinical use, the outcome models of longitudinal cohort data need to undergo a multi-centre evaluation of their predictive accuracy. This thesis evaluates the possible gain in predictive accuracy in multicentre evaluation of a flexible model with Bayesian regularisation, the (PLANN-ARD), using a reference data set for breast cancer, which comprises 4016 records from patients diagnosed during 1989-93 and reported by the BCCA, Canada, with follow-up of 10 years. The method is compared with the widely used Cox regression model. Both methods were fitted to routinely acquired data from 743 patients diagnosed during 1990-94 at the Christie Hospital, UK, with follow-up of 5 years following surgery. Methodological advances developed to support the external validation of this neural network with clinical data include: imputation of missing data in both the training and validation data sets; and a prognostic index for stratification of patients into risk groups that can be extended to non-linear models. Predictive accuracy was measured empirically with a standard discrimination index, Ctd, and with a calibration measure, using the Hosmer-Lemeshow test statistic. Both Cox regression and the PLANN-ARD model are found to have similar discrimination but the neural network showed marginally better predictive accuracy over the 5-year followup period. In addition, the regularised neural network has the substantial advantage of being suited for making predictions of hazard rates and survival for individual patients. Four different approaches to stratify patients into risk groups are also proposed, each with a different foundation. While it was found that the four methodologies broadly agree, there are important differences between them. Rules sets were extracted and compared for the two stratification methods, the log-rank bootstrap and by direct application of regression trees, and with two rule extraction methodologies, OSRE and CART, respectively. In addition, widely used clinical breast cancer prognostic indexes such as the NPI, TNM and St. Gallen consensus rules, were compared with the proposed prognostic models expressed as regression trees, concluding that the suggested approaches may enhance current practice. Finally, a Web clinical decision support system is proposed for clinical oncologists and for breast cancer patients making prognostic assessments, which is tailored to the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the NPI, Cox regression modelling and PLANN-ARD. For a given patient, all three models yield a generally consistent but not identical set of prognostic indices that can be analysed together in order to obtain a consensus and so achieve a more robust prognostic assessment of the expected patient outcome

    Deep Learning for Survival Analysis: A Review

    Full text link
    The influx of deep learning (DL) techniques into the field of survival analysis in recent years, coupled with the increasing availability of high-dimensional omics data and unstructured data like images or text, has led to substantial methodological progress; for instance, learning from such high-dimensional or unstructured data. Numerous modern DL-based survival methods have been developed since the mid-2010s; however, they often address only a small subset of scenarios in the time-to-event data setting - e.g., single-risk right-censored survival tasks - and neglect to incorporate more complex (and common) settings. Partially, this is due to a lack of exchange between experts in the respective fields. In this work, we provide a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In doing so, we hope to provide a helpful overview to practitioners who are interested in DL techniques applicable to their specific use case as well as to enable researchers from both fields to identify directions for future investigation. We provide a detailed characterization of the methods included in this review as an open-source, interactive table: https://survival-org.github.io/DL4Survival. As this research area is advancing rapidly, we encourage the research community to contribute to keeping the information up to date.Comment: 24 pages, 6 figures, 2 tables, 1 interactive tabl

    The risk of re-intervention after endovascular aortic aneurysm repair

    Get PDF
    This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan

    A theoretical and methodological framework for machine learning in survival analysis: Enabling transparent and accessible predictive modelling on right-censored time-to-event data

    Get PDF
    Survival analysis is an important field of Statistics concerned with mak- ing time-to-event predictions with ‘censored’ data. Machine learning, specifically supervised learning, is the field of Statistics concerned with using state-of-the-art algorithms in order to make predictions on unseen data. This thesis looks at unifying these two fields as current research into the two is still disjoint, with ‘classical survival’ on one side and su- pervised learning (primarily classification and regression) on the other. This PhD aims to improve the quality of machine learning research in survival analysis by focusing on transparency, accessibility, and predic- tive performance in model building and evaluation. This is achieved by examining historic and current proposals and implementations for models and measures (both classical and machine learning) in survival analysis and making novel contributions. In particular this includes: i) a survey of survival models including a crit- ical and technical survey of almost all supervised learning model classes currently utilised in survival, as well as novel adaptations; ii) a survey of evaluation measures for survival models, including key definitions, proofs and theorems for survival scoring rules that had previously been missing from the literature; iii) introduction and formalisation of composition and reduction in survival analysis, with a view on increasing transparency of modelling strategies and improving predictive performance; iv) imple- mentation of several R software packages, in particular mlr3proba for machine learning in survival analysis; and v) the first large-scale bench- mark experiment on right-censored time-to-event data with 24 survival models and 66 datasets. Survival analysis has many important applications in medical statistics, engineering and finance, and as such requires the same level of rigour as other machine learning fields such as regression and classification; this thesis aims to make this clear by describing a framework from prediction and evaluation to implementation

    Deep Learning Applications for Biomedical Data and Natural Language Processing

    Get PDF
    The human brain can be seen as an ensemble of interconnected neurons, more or less specialized to solve different cognitive and motor tasks. In computer science, the term deep learning is often applied to signify sets of interconnected nodes, where deep means that they have several computational layers. Development of deep learning is essentially a quest to mimic how the human brain, at least partially, operates.In this thesis, I will use machine learning techniques to tackle two different domain of problems. The first is a problem in natural language processing. We improved classification of relations within images, using text associated with the pictures. The second domain is regarding heart transplant. We created models for pre- and post-transplant survival and simulated a whole transplantation queue, to be able to asses the impact of different allocation policies. We used deep learning models to solve these problems.As introduction to these problems, I will present the basic concepts of machine learning, how to represent data, how to evaluate prediction results, and how to create different models to predict values from data. Following that, I will also introduce the field of heart transplant and some information about simulation

    An evolution of statistical pipe failure models for drinking water networks: a targeted review

    Get PDF
    The use of statistical models to predict pipe failures has become an important tool for proactive management of drinking water networks. This targeted review provides an overview of the evolution of existing statistical models, grouped into three categories: deterministic, probabilistic and machine learning. The main advantage of deterministic models is simplicity and relative minimal data requirement. Deterministic models predicting failure rates for the network or large groups of pipes performs well and are useful for shorter prediction intervals that describe the influences of seasonality. Probabilistic models can accommodate randomness and are useful for predicting time to failure, interarrival times and the probability of failure. Probability models are useful for individual pipe models. Generally, machine learning describes large complex data more accurately and can improve predictions for individual pipe failure models yet are complex and require expert knowledge. Non-parametric models are better suited to the non-linear relationships between pipe failure variables. Census data and socio-economic data requires further research. The complexity of choosing the most appropriate statistical model requires careful consideration of the type of variables, prediction interval, spatial level, response type and level of inference is required.Natural Environment Research Council (NERC): NE/M009009/1
    corecore