Getting the most from medical VOC data using Bayesian feature learning

Skinner, J. R.

thesis

Getting the most from medical VOC data using Bayesian feature learning

Authors: J. R. Skinner
Publication date: 1 July 2019
Publisher

Abstract

The metabolic processes in the body naturally produce a diverse set of Volatile Organic Compounds (VOCs), which are excreted in breath, urine, stool and other biological samples. The VOCs produced are odorous and influenced by disease, meaning olfaction can provide information on a person’s disease state. A variety of instruments exist for performing “artificial olfaction”: measuring a sample, such as patient breath, and producing a high dimensional output representing the odour. Such instruments may be paired with machine learning techniques to identify properties of interest, such as the presence of a given disease. Research shows good disease-predictive ability of artificial olfaction instrumentation. However, the statistical methods employed are typically off-the-shelf, and do not take advantage of prior knowledge of the structure of the high dimensional data. Since sample sizes are also typically small, this can lead to suboptimal results due to a poorly-learned model. In this thesis we explore ways to get more out of artificial olfaction data. We perform statistical analyses in a medical setting, investigating disease diagnosis from breath, urine and vaginal swab measurements, and illustrating both successful identification and failure cases. We then introduce two new latent variable models constructed for dimension reduction of artificial olfaction data, but which are widely applicable. These models place a Gaussian Process (GP) prior on the mapping from latent variables to observations. Specifying a covariance function for the GP prior is an intuitive way for a user to describe their prior knowledge of the data covariance structure. We also enable an approximate posterior and marginal likelihood to be computed, and introduce a sparse variant. Both models have been made available in the R package stpca hosted at https://github.com/JimSkinner/stpca. In experiments with artificial olfaction data, these models outperform standard feature learning methods in a predictive pipeline

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Warwick Research Archives Portal Repository

oai:wrap.warwick.ac.uk:134226

Last time updated on 11/03/2020