Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Abstract

Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite its importance. We propose a novel D\mathbf{D}oubly R\mathbf{R}obust A\mathbf{A}ugmented M\mathbf{M}odel A\mathbf{A}ccuracy T\mathbf{T}ransfer I\mathbf{I}nferenC\mathbf{C}e (DRAMATIC) method for point and interval estimation of commonly used classification performance measures in an unlabeled target population using labeled source data. Specifically, DRAMATIC derives and evaluates the risk model for a binary response YY against some low dimensional predictors A\mathbf{A} on the target population, leveraging YY from source data only and high dimensional adjustment features X\mathbf{X} from both the source and target data. The proposed estimators are doubly robust in the sense that they are n1/2n^{1/2} consistent when at least one model is correctly specified and certain model sparsity assumptions hold. Simulation results demonstrate that the point estimation have negligible bias and the confidence intervals derived by DRAMATIC attain satisfactory empirical coverage levels. We further illustrate the utility of our method to transfer the genetic risk prediction model and its accuracy evaluation for type II diabetes across two patient cohorts in Mass General Brigham (MGB) collected using different sampling mechanisms and at different time points

    Similar works

    Full text

    thumbnail-image

    Available Versions