Difficulty in identifying cancer stage in health care claims data has limited
oncology quality of care and health outcomes research. We fit prediction
algorithms for classifying lung cancer stage into three classes (stages I/II,
stage III, and stage IV) using claims data, and then demonstrate a method for
incorporating the classification uncertainty in outcomes estimation. Leveraging
set-valued classification and split conformal inference, we show how a fixed
algorithm developed in one cohort of data may be deployed in another, while
rigorously accounting for uncertainty from the initial classification step. We
demonstrate this process using SEER cancer registry data linked with Medicare
claims data.Comment: Code available at:
https://github.com/sl-bergquist/cancer_classificatio