Accurate cancer risk estimation is crucial to clinical decision-making, such
as identifying high-risk people for screening. However, most existing cancer
risk models incorporate data from epidemiologic studies, which usually cannot
represent the target population. While population-based health surveys are
ideal for making inference to the target population, they typically do not
collect time-to-cancer incidence data. Instead, time-to-cancer specific
mortality is often readily available on surveys via linkage to vital
statistics. We develop calibrated pseudoweighting methods that integrate
individual-level data from a cohort and a survey, and summary statistics of
cancer incidence from national cancer registries. By leveraging
individual-level cancer mortality data in the survey, the proposed methods
impute time-to-cancer incidence for survey sample individuals and use survey
calibration with auxiliary variables of influence functions generated from Cox
regression to improve robustness and efficiency of the inverse-propensity
pseudoweighting method in estimating pure risks. We develop a lung cancer
incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian
(PLCO) Cancer Screening Trial using our proposed methods by integrating data
from the National Health Interview Survey (NHIS) and cancer registries