273 research outputs found
A Dirichlet process mixture regression model for the analysis of competing risk events
We develop a regression model for the analysis of competing risk events. The joint distribution of the time to these events is flexibly characterized by a random effect which follows a discrete probability distribution drawn from a Dirichlet Process, explaining their variability. This entails an additional layer of flexibility of this joint model, whose inference is robust with respect to the misspecification of the distribution of the random effects. The model is analysed in a fully Bayesian setting, yielding a flexible Dirichlet Process Mixture model for the joint distribution of the time to events. An efficient MCMC sampler is developed for inference. The modelling approach is applied to the empirical analysis of the surrending risk in a US life insurance portfolio previously analysed by Milhaud and Dutang (2018). The approach yields an improved predictive performance of the surrending rates.</p
Missing Data Imputation with High-Dimensional Data
Imputation of missing data in high-dimensional datasets with more variables P than samples N, (Formula presented.), is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this article, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modeling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching; and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome (i.e., an extracted set of correlated biomarkers from human urine samples) was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.</p
Kendall's tau estimator for bivariate zero-inflated count data
This paper extends the work of Pimentel et al. (2015), presenting an estimator of Kendall's Ï„ for bivariate zero-inflated count data. We provide achievable bounds of our proposed estimator and suggest how to estimate them, thereby making the estimator useful in practice.</p
The built-in selection bias of hazard ratios formalized using structural causal models
It is known that the hazard ratio lacks a useful causal interpretation. Even for data from a randomized controlled trial, the hazard ratio suffers from so-called built-in selection bias as, over time, the individuals at risk among the exposed and unexposed are no longer exchangeable. In this paper, we formalize how the expectation of the observed hazard ratio evolves and deviates from the causal effect of interest in the presence of heterogeneity of the hazard rate of unexposed individuals (frailty) and heterogeneity in effect (individual modification). For the case of effect heterogeneity, we define the causal hazard ratio. We show that the expected observed hazard ratio equals the ratio of expectations of the latent variables (frailty and modifier) conditionally on survival in the world with and without exposure, respectively. Examples with gamma, inverse Gaussian and compound Poisson distributed frailty and categorical (harming, beneficial or neutral) distributed effect modifiers are presented for illustration. This set of examples shows that an observed hazard ratio with a particular value can arise for all values of the causal hazard ratio. Therefore, the hazard ratio cannot be used as a measure of the causal effect without making untestable assumptions, stressing the importance of using more appropriate estimands, such as contrasts of the survival probabilities.</p
Bias of the additive hazard model in the presence of causal effect heterogeneity
Hazard ratios are prone to selection bias, compromising their use as causal estimands. On the other hand, if Aalen’s additive hazard model applies, the hazard difference has been shown to remain unaffected by the selection of frailty factors over time. Then, in the absence of confounding, observed hazard differences are equal in expectation to the causal hazard differences. However, in the presence of effect (on the hazard) heterogeneity, the observed hazard difference is also affected by selection of survivors. In this work, we formalize how the observed hazard difference (from a randomized controlled trial) evolves by selecting favourable levels of effect modifiers in the exposed group and thus deviates from the causal effect of interest. Such selection may result in a non-linear integrated hazard difference curve even when the individual causal effects are time-invariant. Therefore, a homogeneous time-varying causal additive effect on the hazard cannot be distinguished from a time-invariant but heterogeneous causal effect. We illustrate this causal issue by studying the effect of chemotherapy on the survival time of patients suffering from carcinoma of the oropharynx using data from a clinical trial. The hazard difference can thus not be used as an appropriate measure of the causal effect without making untestable assumptions.</p
Joint modeling with time-dependent treatment and heteroskedasticity: Bayesian analysis with application to the Framingham Heart Study
Medical studies for chronic disease are often interested in the relation
between longitudinal risk factor profiles and individuals' later life disease
outcomes. These profiles may typically be subject to intermediate structural
changes due to treatment or environmental influences. Analysis of such studies
may be handled by the joint model framework. However, current joint modeling
does not consider structural changes in the residual variability of the risk
profile nor consider the influence of subject-specific residual variability on
the time-to-event outcome. In the present paper, we extend the joint model
framework to address these two heterogeneous intra-individual variabilities. A
Bayesian approach is used to estimate the unknown parameters and simulation
studies are conducted to investigate the performance of the method. The
proposed joint model is applied to the Framingham Heart Study to investigate
the influence of anti-hypertensive medication on the systolic blood pressure
variability together with its effect on the risk of developing cardiovascular
disease. We show that anti-hypertensive medication is associated with elevated
systolic blood pressure variability and increased variability elevates risk of
developing cardiovascular disease.Comment: 34 pages, 4 figure
Model stability of COVID-19 mortality prediction with biomarkers
Coronavirus disease 2019 (COVID-19) is an unprecedented and fast evolving pandemic, which has caused a large number of critically ill patients and deaths globally. It is an acute public health crisis leading to overloaded critical care capacity. Timely prediction of the clinical outcome (death/survival) of hospital-admitted COVID-19 patients can provide early warnings to clinicians, allowing improved allocation of medical resources. In a recently published paper, an interpretable machine learning model was presented to predict the mortality of COVID-19 patients with blood biomarkers, where the model was trained and tested on relatively small data sets. However, the model or performance stability was not explored and assessed. By re-analyzing the data, we reveal that the reported mortality prediction performance was likely over-optimistic and its uncertainty was underestimated or overlooked, with a large variability in predicting deaths
- …