Phenotyping with Partially Labeled, Partially Observed Data

Rodriguez, Victor Alfonso

Phenotyping with Partially Labeled, Partially Observed Data

Authors: Victor Alfonso Rodriguez
Publication date: 1 January 2023
Publisher
Doi

Abstract

Identifying a group of individuals that share a common set of characteristics is a conceptually simple task, which is often difficult in practice. Such phenotyping problems emerge in various settings, including the analysis of clinical data. In this setting, phenotyping is often stymied by persistent data quality issues. These include a lack of reliable labels to indicate the presence of absence of characteristics of interest, and significant missingness in observed variables. This dissertation introduces methods for learning phenotypes when the data contain missing values (partially observed) and labels are scarce (partially labeled). Aim 1 utilizes an unsupervised probabilistic graphical model to learn phenotypes from partially observed data. Aim 2 introduces a related semi-supervised probabilistic graphical model for learning phenotypes from partially labeled clinical data. Finally, Aim 3 describes a method for training deep generative models when the training data contain missing values. The algorithm is then applied in a semi-supervised setting where it accounts for partially labeled data as well

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Columbia University Academic Commons

oai:academiccommons.columbia.e...

Last time updated on 05/11/2023