Quantitative bio-analytical techniques that enable parallel measurements of large
numbers of biomolecules generate vast amounts of information for studying and
characterising biological systems. These analytical methods are commonly referred
to as omics technologies, and can be applied for measurements of e.g. mRNA transcript,
protein or metabolite abundances in a biological sample.
The work presented in this thesis focuses on the application of multivariate prediction
models for modelling and analysis of biological data generated by omics
technologies. Omics data commonly contain up to tens of thousands of variables,
which are often both noisy and multicollinear. Multivariate statistical methods have
previously been shown to be valuable for visualisation and predictive modelling of
biological and chemical data with similar properties to omics data. In this thesis
currently available multivariate modelling methods are used in new applications,
and new methods are developed to address some of the specific challenges associated
with modelling of biological data.
Three closely related areas of multivariate modelling of biological data are described
and demonstrated in this thesis. First, a multivariate projection method is
used in a novel application for predictive modelling between omics data sets, demonstrating
how data from two analytical sources can be integrated and modelled to-
gether by exploring covariation patterns between the data sets. This approach is
exemplified by modelling of data from two studies, the first containing proteomic
and metabolic profiling data and the second containing transcriptomic and metabolic
profiling data. Second, a method for piecewise multivariate modelling of short timeseries
data is developed and demonstrated by modelling of simulated data as well
as metabolic profiling data from a toxicity study, providing a new method for characterisation
of multivariate bio-analytical time-series data. Third, a kernel-based
method is developed and applied for non-linear multivariate prediction modelling
of omics data, addressing the specific challenge of modelling non-linear variation in
biological data