4 research outputs found

    Generating synthetic data from administrative health records for drug safety and effectiveness studies

    Get PDF
    Introduction Administrative health records (AHRs) are used to conduct population-based post-market drug safety and comparative effectiveness studies to inform healthcare decision making. However, the cost of data extraction, and the challenges associated with privacy and securing approvals can make it challenging for researchers to conduct methodological research in a timely manner using real data. Generating synthetic AHRs that reasonably represent the real-world data are beneficial for developing analytic methods and training analysts to rapidly implement study protocols. We generated synthetic AHRs using two methods and compared these synthetic AHRs to real-world AHRs. We described the challenges associated with using synthetic AHRs for real-world study. Methods The real-world AHRs comprised prescription drug records for individuals with healthcare insurance coverage in the Population Research Data Repository (PRDR) from Manitoba, Canada for the 10-year period from 2008 to 2017. Synthetic data were generated using the Observational Medical Dataset Simulator II (OSIM2) and a modification (ModOSIM). Synthetic and real-world data were described using frequencies and percentages. Agreement of prescription drug use measures in PRDR, OSIM2 and ModOSIM was estimated with the concordance coefficient. Results The PRDR cohort included 169,586,633 drug records and 1,395 drug types for 1,604,734 individuals. Synthetic data for 1,000,000 individuals were generated using OSIM2 and ModOSIM. Sex and age group distributions were similar in the real-world and synthetic AHRs. However, there were significant differences in the number of drug records and number of unique drugs per person for OSIM2 and ModOSIM when compared with PRDR. For the average number of days of drug use, concordance with the PRDR was 16% (95% confidence interval [CI]: 12%-19%) for OSIM2 and 88% (95% CI: 87%-90%) for ModOSIM. Conclusions ModOSIM data were more similar to PRDR than OSIM2 data on many measures. Synthetic AHRs consistent with those found in real-world settings can be generated using ModOSIM. Synthetic data will benefit rapid implementation of methodological studies and data analyst training

    Machine learning methods for precision medicine

    Get PDF
    In precision medicine, predicting the risk of an event during a specific period may help, for example, to identify patients that need early preventive treatment. Modern machine learning (ML) techniques are therefore ideal for building these predictions. However, medical datasets often suffer from right-censoring of the outcome of interest posing an obstacle to the direct applicability of ML algorithms. The aim of this thesis work is to develop and advance methods for prediction in settings of right-censoring, and in some settings also including competing risks. Specifically, in Project I, we developed an approach that combines inverse probability of censoring weighting (IPCW) with bagging as a pre-processing step to enable the application of all existing ML methods for classification in settings of right-censoring and competing risks, and we propose a procedure to combine optimally a set of single IPCW bagged methods. In Project II, we developed an extension of Project 1 to combine optimally not only over ML procedures for the same outcome but combining survival outcomes such as Cox regression model and continuous outcome such as pseudo-observations-based regression. In Project III, we integrated pseudo-observations into Convolutional Neural Network to predict the cumulative incidence using images and structured clinical data. In Project IV, we applied the methods developed in Project 1-2 to build a flexible risk prediction model to predict the risk of any cancer diagnosis using a Swedish population-based register among sarcoidosis patients. In the last project, Project V, we explored the utility of a dynamic prediction model in a setting of complete data as decision support tool for public health to manage future pandemics. Specifically, we applied two state-of-the-art batch reinforcement learning algorithms to learn the best face covering policy response at the national level with the goal of reducing the spread of COVID-19

    Bayesian adjustment for confounding in Bayesian propensity score estimation

    No full text
    The problem of variable selection for propensity score (PS) models is a central issue that researchers face. Joint Bayesian PS methods for variable selection on the PS models have been recently proposed by Zigler and Dominici (2014, Journal of the American Statistical Association, 109, 95-107). However, these methods are not exempt from the known limitation that a confounder selection strategy for the PS models suffers; they tend to include variables that are associated with the exposure even if they are not associated with the outcome. Building upon this work and the work of Wang et al. (2012, Biometrics, 68, 661-671), we propose a new approach, which we call Bayesian Adjustment for Confounding (BAC) in Bayesian PS. The objective of this work is to estimate the average causal effect as a weighted average over different PS models using this new approach in order to mitigate the limitation on the variable selection that the previous methods exhibit. Our approach is a two stage procedure based on three models: (1) the outcome as a function of the potential confounders (the prognostic score model); (2) the exposure as a function of the potential confounders (the PS model), and (3) the outcome as a function of the exposure, the potential confounders and PS (the outcome model). The key to our approach is the incorporation in the second stage of an informative prior distribution on the PS models that links the prognostic score model with the PS model. The informative prior provides a chance to rule out instrumental variables (IVs) from the PS model, assigning less prior probability to those models that include IVs and favoring those containing the set of confounders and predictors of outcome. We illustrate features of our proposed approach through a simulation study.Le problème de sélection de variable pour les modèles de score de pretension(SP) est un enjeu central auquel font face les chercheurs. Des méthodes jointes bayésiennes pour le SP ont récemment été proposées par Zigler et Dominici (2014, Journal of the American Statistical Association, 109, 95-107). Or, ces méthodes ne sont pas étrangères au problème connu dont les stratégies de sélection de facteurs confondants soufrent; elles ont tendance à inclure les variables qui sont associées avec la variable d'exposition, même si elles ne sont pas associées au résultat. En se basant sur les résultats ci-haut et sur ceux de Wang et al. (2012, Biometrics, 68, 661-671), nous proposons une nouvelle approche, que nous nommons Ajustement bayésien pour facteurs confondants en SP bayésien. L'objectif de cette approche est d'estimer l'effet causal moyen à l'aide d'une moyenne pondérée, indexée par différents modèles de SP de manière à diminuer l'impact de la limitation dont souffre les méthodes de sélection de variables, tel que mentionne ci-haut. Notre approche est une procédure en deux étapes, basée sur trois modèles : (1) le résultat comme fonction des facteurs confondants potentiels (le modèle de score prognostique); (2) la variable d'exposition comme fonction des facteurs confondants potentiels (le modèle de SP), et (3) le résultat comme fonction de la variable d'exposition, les facteurs confondants potentiels et le SP (le modèle résultat). La clé de notre approche est l'introduction à la deuxième étape d'une prieure informative pour le modèle de SP qui relit ce modèle et le modèle de score prognostique. La prieure informative permet d'éliminer les variables instrumentales (VIs) du modèle de SP, donnant une probabilité a priori moindre aux modèles qui contiennent des VIs et favorisant ceux contenant les facteurs confondants et les prédicteurs du résultat. Nous illustrons certaines propriétés de notre approche à l'aide d'une étude de simulation
    corecore