Efficiently analyzing large patient registries with Bayesian joint models for longitudinal and time-to-event data

Abstract

The joint modeling of longitudinal and time-to-event outcomes has become a popular tool infollow-up studies. However, fitting Bayesian joint models to large datasets, such as patientregistries, can require extended computing times. To speed up sampling, we divided a patient registry dataset into subsamples, analyzed them in parallel, and combined the resultingMarkov chain Monte Carlo draws into a consensus distribution. We used a simulation studyto investigate how different consensus strategies perform with joint models. In particular,we compared grouping all draws together with using equal- and precision-weighted averages.We considered scenarios reflecting different sample sizes, numbers of data splits, and processor characteristics. Parallelization of the sampling process substantially decreased the timerequired to run the model. We found that the weighted-average consensus distributions forlarge sample sizes were nearly identical to the target posterior distribution. The proposedalgorithm has been made available in an R package for joint models, JMbayes2. This workwas motivated by the clinical interest in investigating the association between ppFEV1, acommonly measured marker of lung function, and the risk of lung transplant or death, using data from the US Cystic Fibrosis Foundation Patient Registry (35,153 individuals with372,366 years of cumulative follow-up). Splitting the registry into five subsamples resultedin an 85% decrease in computing time, from 9.22 to 1.39 hours. Splitting the data and finding a consensus distribution by precision-weighted averaging proved to be a computationallyefficient and robust approach to handling large datasets under the joint modeling framework

    Similar works