5 research outputs found
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference
Gaussian process latent variable models (GPLVM) are a flexible and non-linear
approach to dimensionality reduction, extending classical Gaussian processes to
an unsupervised learning context. The Bayesian incarnation of the GPLVM Titsias
and Lawrence, 2010] uses a variational framework, where the posterior over
latent variables is approximated by a well-behaved variational family, a
factorized Gaussian yielding a tractable lower bound. However, the
non-factories ability of the lower bound prevents truly scalable inference. In
this work, we study the doubly stochastic formulation of the Bayesian GPLVM
model amenable with minibatch training. We show how this framework is
compatible with different latent variable formulations and perform experiments
to compare a suite of models. Further, we demonstrate how we can train in the
presence of massively missing data and obtain high-fidelity reconstructions. We
demonstrate the model's performance by benchmarking against the canonical
sparse GPLVM for high-dimensional data examples.Comment: AISTATS 202
Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs
Single-cell RNA-seq datasets are growing in size and complexity, enabling the
study of cellular composition changes in various biological/clinical contexts.
Scalable dimensionality reduction techniques are in need to disentangle
biological variation in them, while accounting for technical and biological
confounders. In this work, we extend a popular approach for probabilistic
non-linear dimensionality reduction, the Gaussian process latent variable
model, to scale to massive single-cell datasets while explicitly accounting for
technical and biological confounders. The key idea is to use an augmented
kernel which preserves the factorisability of the lower bound allowing for fast
stochastic variational inference. We demonstrate its ability to reconstruct
latent signatures of innate immunity recovered in Kumasaka et al. (2021) with
9x lower training time. We further analyze a COVID dataset and demonstrate
across a cohort of 130 individuals, that this framework enables data
integration while capturing interpretable signatures of infection.
Specifically, we explore COVID severity as a latent dimension to refine patient
stratification and capture disease-specific gene expression.Comment: Machine Learning and Computational Biology Symposium (Oral), 202
Predicting ruthenium catalysed hydrogenation of esters using machine learning
AK thanks the Leverhulme Trust for an early career fellowship (ECF-2019-161). AK and CNB thank the UKRI Future Leaders Fellowship (MR/W007460/1). NVW and EB thank the IdEx Université de Paris (ANR-18-IDEX-0001) for funding. CM is supported by a Fellowship by the Accelerate Program for Scientific Discovery at the Computer Laboratory, University of Cambridge. The authors acknowledge the GENCI-CINES center for HPC resources (Projects A0080810359, A0100810359, and AD010812061R1).Catalytic hydrogenation of esters is a sustainable approach for the production of fine chemicals, and pharmaceutical drugs. However, the efficiency and cost of catalysts are often bottlenecks in the commercialization of such technologies. The conventional approach to catalyst discovery is based on empiricism, which makes the discovery process time-consuming and expensive. There is an urgent need to develop effective approaches to discover efficient catalysts for hydrogenation reactions. In this work, we develop a machine learning approach aided by Gaussian Processes to predict outcomes of catalytic hydrogenation of esters. Results of the Gaussian Process are compared with Linear regression and Neural Network models. Our optimized models can predict the reaction yields with a root mean square error (RMSE) of 12.1% on unseen data and suggest that the use of certain chemical descriptors (e.g. electronic parameters) selectively can result in a more accurate model. Furthermore, studies have also been carried out for the prediction of catalysts and reaction conditions such as temperature and pressure as well as their validation by performing hydrogenation reactions to improve the poor yields described in the dataset.Publisher PDFPeer reviewe
Recommended from our members
Predicting ruthenium catalysed hydrogenation of esters using machine learning
Acknowledgements: AK thanks the Leverhulme Trust for an early career fellowship (ECF-2019-161). AK and CNB thank the UKRI Future Leaders Fellowship (MR/W007460/1). NVW and EB thank the IdEx Université Paris Cité (ANR-18-IDEX-0001) for funding. CM is supported by a fellowship by the Accelerate Program for Scientific Discovery at the Computer Laboratory, University of Cambridge. The authors acknowledge the GENCI-CINES Center for HPC resources (projects A0080810359, A0100810359, and AD010812061R1).The report describes the application of machine learning tools to predict hydrogenation of esters using molecular catalysts based on ruthenium.</jats:p