Phenotyping with Partially Labeled, Partially Observed Data

Abstract

Identifying a group of individuals that share a common set of characteristics is a conceptually simple task, which is often difficult in practice. Such phenotyping problems emerge in various settings, including the analysis of clinical data. In this setting, phenotyping is often stymied by persistent data quality issues. These include a lack of reliable labels to indicate the presence of absence of characteristics of interest, and significant missingness in observed variables. This dissertation introduces methods for learning phenotypes when the data contain missing values (partially observed) and labels are scarce (partially labeled). Aim 1 utilizes an unsupervised probabilistic graphical model to learn phenotypes from partially observed data. Aim 2 introduces a related semi-supervised probabilistic graphical model for learning phenotypes from partially labeled clinical data. Finally, Aim 3 describes a method for training deep generative models when the training data contain missing values. The algorithm is then applied in a semi-supervised setting where it accounts for partially labeled data as well

    Similar works