2,933 research outputs found

    Prognostic modelling of breast cancer patients: a benchmark of predictive models with external validation

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores – Sistemas Digitais e Percepcionais pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThere are several clinical prognostic models in the medical field. Prior to clinical use, the outcome models of longitudinal cohort data need to undergo a multi-centre evaluation of their predictive accuracy. This thesis evaluates the possible gain in predictive accuracy in multicentre evaluation of a flexible model with Bayesian regularisation, the (PLANN-ARD), using a reference data set for breast cancer, which comprises 4016 records from patients diagnosed during 1989-93 and reported by the BCCA, Canada, with follow-up of 10 years. The method is compared with the widely used Cox regression model. Both methods were fitted to routinely acquired data from 743 patients diagnosed during 1990-94 at the Christie Hospital, UK, with follow-up of 5 years following surgery. Methodological advances developed to support the external validation of this neural network with clinical data include: imputation of missing data in both the training and validation data sets; and a prognostic index for stratification of patients into risk groups that can be extended to non-linear models. Predictive accuracy was measured empirically with a standard discrimination index, Ctd, and with a calibration measure, using the Hosmer-Lemeshow test statistic. Both Cox regression and the PLANN-ARD model are found to have similar discrimination but the neural network showed marginally better predictive accuracy over the 5-year followup period. In addition, the regularised neural network has the substantial advantage of being suited for making predictions of hazard rates and survival for individual patients. Four different approaches to stratify patients into risk groups are also proposed, each with a different foundation. While it was found that the four methodologies broadly agree, there are important differences between them. Rules sets were extracted and compared for the two stratification methods, the log-rank bootstrap and by direct application of regression trees, and with two rule extraction methodologies, OSRE and CART, respectively. In addition, widely used clinical breast cancer prognostic indexes such as the NPI, TNM and St. Gallen consensus rules, were compared with the proposed prognostic models expressed as regression trees, concluding that the suggested approaches may enhance current practice. Finally, a Web clinical decision support system is proposed for clinical oncologists and for breast cancer patients making prognostic assessments, which is tailored to the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the NPI, Cox regression modelling and PLANN-ARD. For a given patient, all three models yield a generally consistent but not identical set of prognostic indices that can be analysed together in order to obtain a consensus and so achieve a more robust prognostic assessment of the expected patient outcome

    Is a Calvo price setting model consistent with micro price data?

    Get PDF
    This paper shows that the standard Calvo model clearly fails to account for the distribution of price durations found in micro data. We propose a novel price setting model that fully captures heterogeneity in individual pricing behavior. Specifi cally, we assume that there is a continuum of firms that set prices according to a Calvo mechanism, each of them with a possibly different price adjustment parameter. The model is estimated by maximum likelihood and closely matches individual consumer and producer price data. Incorporating estimated price setting rules into a standard DSGE model shows that fully accounting for pricing heterogeneity is crucial to understanding infl ation and output dynamics. The standard calibration that assumes within sector homogeneity, as in Carvalho (2006), is at odds with micro data evidence and leads to a substantial distortion of estimates of the real impact of monetary polic

    SurvSHAP(t): Time-dependent explanations of machine learning survival models

    Full text link
    Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at http://github.com/MI2DataLab/survshap

    Penalized regressions for variable selection model, single index model and an analysis of mass spectrometry data.

    Get PDF
    The focus of this dissertation is to develop statistical methods, under the framework of penalized regressions, to handle three different problems. The first research topic is to address missing data problem for variable selection models including elastic net (ENet) method and sparse partial least squares (SPLS). I proposed a multiple imputation (MI) based weighted ENet (MI-WENet) method based on the stacked MI data and a weighting scheme for each observation. Numerical simulations were implemented to examine the performance of the MIWENet method, and compare it with competing alternatives. I then applied the MI-WENet method to examine the predictors for the endothelial function characterized by median effective dose and maximum effect in an ex-vivo experiment. The second topic is to develop monotonic single-index models for assessing drug interactions. In single-index models, the link function f is unnecessary monotonic. However, in combination drug studies, it is desired to have a monotonic link function f . I proposed to estimate f by using penalized splines with I-spline basis. An algorithm for estimating f and the parameter a in the index was developed. Simulation studies were conducted to examine the performance of the proposed models in term of accuracy in estimating f and a. Moreover, I applied the proposed method to examine the drug interaction of two drugs in a real case study. The third topic was focused on the SPLS and ENet based accelerated failure time (AFT) models for predicting patient survival time with mass spectrometry (MS) data. A typical MS data set contains limited number of spectra, while each spectrum contains tens of thousands of intensity measurements representing an unknown number of peptide peaks as the key features of interest. Due to the high dimension and high correlations among features, traditional linear regression modeling is not applicable. Semi-parametric AFT model with an unspecified error distribution is a well-accepted approach in survival analysis. To reduce the bias caused in denoising step, we proposed a nonparametric imputation approach based on Kaplan-Meier estimator. Numerical simulations and a real case study were conducted under the proposed method

    Data-Driven Modeling For Decision Support Systems And Treatment Management In Personalized Healthcare

    Get PDF
    Massive amount of electronic medical records (EMRs) accumulating from patients and populations motivates clinicians and data scientists to collaborate for the advanced analytics to create knowledge that is essential to address the extensive personalized insights needed for patients, clinicians, providers, scientists, and health policy makers. Learning from large and complicated data is using extensively in marketing and commercial enterprises to generate personalized recommendations. Recently the medical research community focuses to take the benefits of big data analytic approaches and moves to personalized (precision) medicine. So, it is a significant period in healthcare and medicine for transferring to a new paradigm. There is a noticeable opportunity to implement a learning health care system and data-driven healthcare to make better medical decisions, better personalized predictions; and more precise discovering of risk factors and their interactions. In this research we focus on data-driven approaches for personalized medicine. We propose a research framework which emphasizes on three main phases: 1) Predictive modeling, 2) Patient subgroup analysis and 3) Treatment recommendation. Our goal is to develop novel methods for each phase and apply them in real-world applications. In the fist phase, we develop a new predictive approach based on feature representation using deep feature learning and word embedding techniques. Our method uses different deep architectures (Stacked autoencoders, Deep belief network and Variational autoencoders) for feature representation in higher-level abstractions to obtain effective and more robust features from EMRs, and then build prediction models on the top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled one is scarce. We investigate the performance of representation learning through a supervised approach. We perform our method on different small and large datasets. Finally we provide a comparative study and show that our predictive approach leads to better results in comparison with others. In the second phase, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variables. Finally, in the third phase, we introduce a new survival analysis framework using deep learning and active learning with a novel sampling strategy. First, our approach provides better representation with lower dimensions from clinical features using labeled (time-to-event) and unlabeled (censored) instances and then actively trains the survival model by labeling the censored data using an oracle. As a clinical assistive tool, we propose a simple yet effective treatment recommendation approach based on our survival model. In the experimental study, we apply our approach on SEER-Medicare data related to prostate cancer among African-Americans and white patients. The results indicate that our approach outperforms significantly than baseline models
    corecore