422 research outputs found

    Bounds in Competing Risks Models and the War on Cancer

    Get PDF
    In 1971 President Nixon declared war on cancer and increased the federal funds allocated to cancer research dramatically. Thirty years later, many have declared this war a failure. Overall cancer statistics confirm this view: age-adjusted mortality in 2000 was essentially unchanged from the early 1970s. At the same time, age-adjusted mortality rates from cardiovascular disease have fallen quite dramatically. Since the causes underlying cancer and cardiovascular disease are likely to be correlated, the decline in mortality rates from cardiovascular disease may be somewhat responsible for the rise in cancer mortality. It is natural to model mortality with more than one cause of death as a competing risks model. Such models are fundamentally unidentified, and it is therefore difficult to get a clear picture of the progress in cancer. This paper derives bounds for aspects of the underlying distributions under a number of different assumptions. Most importantly, we do not assume that the underlying risks are independent, and impose weak parametric assumptions in order to obtain identification. The theoretical contribution of the paper is to provide a framework to estimate competing risk models with interval data and discrete explanatory variables, both of which are common in empirical applications. We use our method to estimate changes in cancer and cardiovascular mortality since 1970. The estimated bounds for the effect of time on the duration until death for either cause are fairly tight and we find that trends in cancer show much larger improvements than previously estimated. For example, we find that time until death from cancer increased by about 10% for white males and 20% for white women.

    Non-parametric competing risks with multivariate frailty models

    Get PDF
    This research focuses on two theories: (i) competing risks and (ii) random eect (frailty) models. The theory of competing risks provides a structure for inference in problems where cases are subject to several types of failure. Random eects in competing risk models consist of two underlying distributions: the conditional distribution of the response variables, given the random eect, depending on the explanatory variables each with a failure type specic random eect; and the distribution of the random eect. In this situation, the distribution of interest is the unconditional distribution of the response variable, which may or may not have a tractable form. The parametric competing risk model, in which it is assumed that the failure times are coming from a known distribution, is widely used such as Weibull, Gamma and other distributions. The Gamma distribution has been widely used as a frailty distribution, perhaps due to its simplicity since it has a closed form expression of the unconditional hazard function. However, it is unrealistic to believe that a few parametric models are suitable for all types of failure time. This research focuses on a distribution free of the multivariate frailty models. Another approach used to overcome this problem is using nite mixture of parametric frailty especially those who have a closed form of unconditional survival function. In addition, the advantages and disadvantages of a parametric competing risk models with multivariate parametric and/or non-parametric frailty (correlated random eects) are investigated. In this research, four main models are proposed: rst, an application of a new computation and analysis of a multivariate frailty with competing risk model using Cholesky decomposition of the Lognormal frailty. Second, a correlated Inverse Gaussian frailty in the presence of competing risks model. Third, a non-parametric multivariate frailty with parametric competing risk model is proposed. Finally, a simulation study of nite mixture of Inverse Gaussian frailty showed the ability of this model to t dierent frailty distribution. One main issue in multivariate analysis is the time it needs to t the model. The proposed non-parametric model showed a signicant time decrease in estimating the model parameters (about 80% less time compared the Log-Normal frailty with nested loops). A real data of recurrence of breast cancer is used as the applications of these models

    Bayesian non-linear methods for survival analysis and structural equation models

    Get PDF
    "July 2014."Dissertation Co-adviser: Dr. Sounak Chakraborty.Dissertation Co-adviser: Dr. (Tony) Jianguo Sun.Includes vita.High dimensional data are more common nowadays, because the collection of such data becomes larger and more complex due to the technology advance of the computer science, biology, etc. The analysis of high dimensional data is different from traditional data analysis, and variable selection for high dimensional data becomes very challenging. Structural equation modeling (SEM) analyzes the relationship between manifest variables and latent variables. The structural equation focuses on analyzing the relationship between latent variables. New proposed methods of these topics are discussed in the dissertation. In the first chapter, we review the basic concept of survival analysis, SEM, and current method of variable selection in those two scenarios. We also introduce the available software package for current methods and relevant data set. In the second chapter, we develop a Bayesian kernel machine model with incorporating existing information on pathways and gene networks in the analysis of DNA microarray data. Each pathway is modeled nonparametrically using reproducing kernel Hilbert space. The pathways and the genes are selected via assigning mixture priors on the pathway indicator variable and the gene indicator variable. This approach helped us in flexible modeling of the pathway effects, which can capture both linear and non-linear effect. Moreover, the model can also pinpoint the important pathways and the important active genes within each pathway. We have also developed an efficient Markov Chain Monte Carlo (MCMC) algorithm to fit our model. We used simulations and a real data analysis, [van 't Veer et al., 2002] breast cancer microarray data, to illustrate the proposed method. In the third chapter, we extend the idea of semiparametric structural equation model where the nonlinear functional relationships are approximated using basis expansions [Guo et al., 2012]. Many basis expansion methods, including cubic splines, are known to induce correlations. In this chapter we compare standard Lasso, Fused Lasso anIncludes bibliographical references (pages 115-122)

    On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility

    Get PDF
    [Abstract]: In classical survival analysis, it is assumed that all the individuals will experience the event of interest. However, if there is a proportion of subjects who will never experience the event, then a standard survival approach is not appropriate, and cure models should be considered instead. This paper deals with the problem of adapting a machine learning approach for classical survival analysis to a situation when cure (i.e., not suffering the event) is a possibility. Specifically, a brief review of cure models and recent machine learning methodologies is presented, and an adaptation of machine learning approaches to account for cured individuals is introduced. In order to validate the proposed methods, we present an extensive simulation study in which we compare the performance of the adapted machine learning algorithms with existing cure models. The results show the good behavior of the semiparametric or the nonparametric approaches, depending on the simulated scenario. The practical utility of the methodology is showcased through two real-world dataset illustrations. In the first one, the results show the gain of using the nonparametric mixture cure model approach. In the second example, the results show the poor performance of some machine learning methods for small sample sizes.This project was funded by the Xunta de Galicia (Axencia Galega de Innovación) Research projects COVID-19 presented in ISCIII IN845D 2020/26, Operational Program FEDER Galicia 2014–2020; by the Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union European Regional Development Fund (ERDF)-Galicia 2014–2020 Program, by grant ED431G 2019/01; and by the Spanish Ministerio de Economía y Competitividad (research projects PID2019-109238GB-C22 and PID2021-128045OA-I00). ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish Grant from MICINN (Ministerio de Ciencia e Innovación) with code BGP18/00154. ALC was partially supported by the MICINN Grant PID2020-113578RB-I00 and partial support of Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14Xunta de Galicia; IN845D 2020/2
    corecore