4 research outputs found

    Multi-task learning for pKa prediction

    Get PDF
    Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n=698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85% of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are availabl

    Transfer learning with Gaussian processes

    Get PDF
    Transfer Learning is an emerging framework for learning from data that aims at intelligently transferring information between tasks. This is achieved by developing algorithms that can perform multiple tasks simultaneously, as well as translating previously acquired knowledge to novel learning problems. In this thesis, we investigate the application of Gaussian Processes to various forms of transfer learning with a focus on classification problems. This process initiates with a thorough introduction to the framework of Transfer learning, providing a clear taxonomy of the areas of research. Following that, we continue by reviewing the recent advances on Multi-task learning for regression with Gaussian processes, and compare the performance of some of these methods on a real data set. This review gives insights about the strengths and weaknesses of each method, which acts as a point of reference to apply these methods to other forms of transfer learning. The main contributions of this thesis are reported in the three following chapters. The third chapter investigates the application of Multi-task Gaussian processes to classification problems. We extend a previously proposed model to the classification scenario, providing three inference methods due to the non-Gaussian likelihood the classification paradigm imposes. The forth chapter extends the multi-task scenario to the semi-supervised case. Using labeled and unlabeled data, we construct a novel covariance function that is able to capture the geometry of the distribution of each task. This setup allows unlabeled data to be utilised to infer the level of correlation between the tasks. Moreover, we also discuss the potential use of this model to situations where no labeled data are available for certain tasks. The fifth chapter investigates a novel form of transfer learning called meta-generalising. The question at hand is if, after training on a sufficient number of tasks, it is possible to make predictions on a novel task. In this situation, the predictor is embedded in an environment of multiple tasks but has no information about the origins of the test task. This elevates the concept of generalising from the level of data to the level of tasks. We employ a model based on a hierarchy of Gaussian processes, in a mixtures of expert sense, to make predictions based on the relation between the distributions of the novel and the training tasks. Each chapter is accompanied with a thorough experimental part giving insights about the potentials and the limits of the proposed methods

    Multi-task learning for pK(a) prediction

    No full text
    ISSN:0920-654XISSN:1573-495
    corecore