4 research outputs found
Predicting the Absorption Rate of Chemicals Through Mammalian Skin Using Machine Learning Algorithms
Machine learning (ML) methods have been applied to the analysis of a range of biological
systems. This thesis evaluates the application of these methods to the problem domain of
skin permeability. ML methods offer great potential in both predictive ability and their
ability to provide mechanistic insight to, in this case, the phenomena of skin permeation.
Historically, refining mathematical models used to predict percutaneous drug absorption
has been thought of as a key factor in this field. Quantitative Structure-Activity Relationships
(QSARs) models are used extensively for this purpose. However, advanced ML
methods successfully outperform the traditional linear QSAR models. In this thesis, the
application of ML methods to percutaneous absorption are investigated and evaluated.
The major approach used in this thesis is Gaussian process (GP) regression method.
This research seeks to enhance the prediction performance by using local non-linear models
obtained from applying clustering algorithms. In addition, to increase the modelâs quality,
a kernel is generated based on both numerical chemical variables and categorical experimental
descriptors. Monte Carlo algorithm is also employed to generate reliable models
from variable data which is inevitable in biological experiments. The datasets used for
this study are small and it may raise the over-fitting/under-fitting problem. In this research
I attempt to find optimal values of skin permeability using GP optimisation algorithms
within small datasets. Although these methods are applied here to the field of percutaneous
absorption, it may be applied more broadly to any biological system
Model fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regression
This is the pre-peer reviewed version of the following article: Parivash Ashrafi, Yi Sun, Neil Davey, Roderick G. Adams, Simon C. Wilkinson, and Gary Patrick Moss, âModel fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regressionâ, Journal of Pharmacy and Pharmacology, Vol. 70 (3): 361-373, March 2018, which has been published in final form at https://doi.org/10.1111/jphp.12863. Under embargo until 17 January 2019. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Objectives The aim of this study was to investigate how to improve predictions from Gaussian Process models by optimising the model hyperparameters. Methods Optimisation methods, including Grid Search, Conjugate Gradient, Random Search, Evolutionary Algorithm and Hyper-prior, were evaluated and applied to previously published data. Data sets were also altered in a structured manner to reduce their size, which retained the range, or âchemical spaceâ of the key descriptors to assess the effect of the data range on model quality. Key findings The Hyper-prior Smoothbox kernel results in the best models for the majority of data sets, and they exhibited significantly better performance than benchmark quantitative structureâpermeability relationship (QSPR) models. When the data sets were systematically reduced in size, the different optimisation methods generally retained their statistical quality, whereas benchmark QSPR models performed poorly. Conclusions The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets.Peer reviewedFinal Accepted Versio
The importance of hyperparameters selection within small datasets
Parivash Ashrafi, Yi Sun, Neil Davey, Rod Adams, Marc B. Brown, Maria Prapopoulou, and Gary Moss, 'The Importance of Hyperparameters Selection within Small Datasets', in Proceedings of the 2015 International Joint Conference on Neural Networks, published in IEEE Explore on 1 October 2015, DOI: 10.1109/IJCNN.2015.7280645. @2015 IEEE.Gaussian Process is a Machine Learning technique that has been applied to the analysis of percutaneous absorption of chemicals through human skin. The normal, automatic method of setting the hyperparameters associated with Gaussian Processes may not be suitable for small datasets. In this paper we investigate whether a handcrafted search method of determining these hyperparameters is better for such datasets
The influence of diffusion cell type and experimental temperature on machine learning models of skin permeability
© 2019 Royal Pharmaceutical Society. This is the peer reviewed version of the following article: The influence of diffusion cell type and experimental temperature on machine learning models of skin permeability, which has been published in final form at https://doi.org/10.1111/jphp.13203 . This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.Objectives The aim of this study was to use Gaussian process regression (GPR) methods to quantify the effect of experimental temperature (Texp) and choice of diffusion cell on model quality and performance. Methods Data were collated from the literature. Static and flowâthrough diffusion cell data were separated, and a series of GPR experiments was conducted. The effect of Texp was assessed by comparing a range of datasets where Texp either remained constant or was varied from 22 to 45 °C. Key findings Using data from flowâthrough diffusion cells results in poor model performance. Data from static diffusion cells resulted in significantly greater performance. Inclusion of data from flowâthrough cell experiments reduces overall model quality. Consideration of Texp improves model quality when the dataset used exhibits a wide range of experimental temperatures. Conclusions This study highlights the problem of collating literature data into datasets from which models are constructed without consideration of the nature of those data. In order to optimise model quality data from only static, Franzâtype, experiments should be used to construct the model and Texp should either be incorporated as a descriptor in the model if data are collated from a range of studies conducted at different temperatures.Peer reviewe