4 research outputs found
Multi-task learning for pKa prediction
Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n=698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85% of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are availabl
Transfer learning with Gaussian processes
Transfer Learning is an emerging framework for learning from data that aims at intelligently
transferring information between tasks. This is achieved by developing algorithms
that can perform multiple tasks simultaneously, as well as translating previously
acquired knowledge to novel learning problems.
In this thesis, we investigate the application of Gaussian Processes to various forms
of transfer learning with a focus on classification problems. This process initiates
with a thorough introduction to the framework of Transfer learning, providing a clear
taxonomy of the areas of research. Following that, we continue by reviewing the recent
advances on Multi-task learning for regression with Gaussian processes, and compare
the performance of some of these methods on a real data set. This review gives insights
about the strengths and weaknesses of each method, which acts as a point of reference
to apply these methods to other forms of transfer learning.
The main contributions of this thesis are reported in the three following chapters.
The third chapter investigates the application of Multi-task Gaussian processes to classification
problems. We extend a previously proposed model to the classification scenario,
providing three inference methods due to the non-Gaussian likelihood the classification
paradigm imposes. The forth chapter extends the multi-task scenario to the
semi-supervised case. Using labeled and unlabeled data, we construct a novel covariance
function that is able to capture the geometry of the distribution of each task. This
setup allows unlabeled data to be utilised to infer the level of correlation between the
tasks. Moreover, we also discuss the potential use of this model to situations where no
labeled data are available for certain tasks. The fifth chapter investigates a novel form
of transfer learning called meta-generalising. The question at hand is if, after training
on a sufficient number of tasks, it is possible to make predictions on a novel task. In
this situation, the predictor is embedded in an environment of multiple tasks but has no
information about the origins of the test task. This elevates the concept of generalising
from the level of data to the level of tasks. We employ a model based on a hierarchy
of Gaussian processes, in a mixtures of expert sense, to make predictions based on the
relation between the distributions of the novel and the training tasks. Each chapter is
accompanied with a thorough experimental part giving insights about the potentials
and the limits of the proposed methods
Multi-task learning for pK(a) prediction
ISSN:0920-654XISSN:1573-495