4 research outputs found

    Predicting the Absorption Rate of Chemicals Through Mammalian Skin Using Machine Learning Algorithms

    Get PDF
    Machine learning (ML) methods have been applied to the analysis of a range of biological systems. This thesis evaluates the application of these methods to the problem domain of skin permeability. ML methods offer great potential in both predictive ability and their ability to provide mechanistic insight to, in this case, the phenomena of skin permeation. Historically, refining mathematical models used to predict percutaneous drug absorption has been thought of as a key factor in this field. Quantitative Structure-Activity Relationships (QSARs) models are used extensively for this purpose. However, advanced ML methods successfully outperform the traditional linear QSAR models. In this thesis, the application of ML methods to percutaneous absorption are investigated and evaluated. The major approach used in this thesis is Gaussian process (GP) regression method. This research seeks to enhance the prediction performance by using local non-linear models obtained from applying clustering algorithms. In addition, to increase the model’s quality, a kernel is generated based on both numerical chemical variables and categorical experimental descriptors. Monte Carlo algorithm is also employed to generate reliable models from variable data which is inevitable in biological experiments. The datasets used for this study are small and it may raise the over-fitting/under-fitting problem. In this research I attempt to find optimal values of skin permeability using GP optimisation algorithms within small datasets. Although these methods are applied here to the field of percutaneous absorption, it may be applied more broadly to any biological system

    Model fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regression

    Get PDF
    This is the pre-peer reviewed version of the following article: Parivash Ashrafi, Yi Sun, Neil Davey, Roderick G. Adams, Simon C. Wilkinson, and Gary Patrick Moss, ‘Model fitting for small skin permeability data sets: hyperparameter optimisation in Gaussian Process Regression’, Journal of Pharmacy and Pharmacology, Vol. 70 (3): 361-373, March 2018, which has been published in final form at https://doi.org/10.1111/jphp.12863. Under embargo until 17 January 2019. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Objectives The aim of this study was to investigate how to improve predictions from Gaussian Process models by optimising the model hyperparameters. Methods Optimisation methods, including Grid Search, Conjugate Gradient, Random Search, Evolutionary Algorithm and Hyper-prior, were evaluated and applied to previously published data. Data sets were also altered in a structured manner to reduce their size, which retained the range, or ‘chemical space’ of the key descriptors to assess the effect of the data range on model quality. Key findings The Hyper-prior Smoothbox kernel results in the best models for the majority of data sets, and they exhibited significantly better performance than benchmark quantitative structure–permeability relationship (QSPR) models. When the data sets were systematically reduced in size, the different optimisation methods generally retained their statistical quality, whereas benchmark QSPR models performed poorly. Conclusions The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets.Peer reviewedFinal Accepted Versio

    The importance of hyperparameters selection within small datasets

    No full text
    Parivash Ashrafi, Yi Sun, Neil Davey, Rod Adams, Marc B. Brown, Maria Prapopoulou, and Gary Moss, 'The Importance of Hyperparameters Selection within Small Datasets', in Proceedings of the 2015 International Joint Conference on Neural Networks, published in IEEE Explore on 1 October 2015, DOI: 10.1109/IJCNN.2015.7280645. @2015 IEEE.Gaussian Process is a Machine Learning technique that has been applied to the analysis of percutaneous absorption of chemicals through human skin. The normal, automatic method of setting the hyperparameters associated with Gaussian Processes may not be suitable for small datasets. In this paper we investigate whether a handcrafted search method of determining these hyperparameters is better for such datasets

    The influence of diffusion cell type and experimental temperature on machine learning models of skin permeability

    Get PDF
    © 2019 Royal Pharmaceutical Society. This is the peer reviewed version of the following article: The influence of diffusion cell type and experimental temperature on machine learning models of skin permeability, which has been published in final form at https://doi.org/10.1111/jphp.13203 . This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.Objectives The aim of this study was to use Gaussian process regression (GPR) methods to quantify the effect of experimental temperature (Texp) and choice of diffusion cell on model quality and performance. Methods Data were collated from the literature. Static and flow‐through diffusion cell data were separated, and a series of GPR experiments was conducted. The effect of Texp was assessed by comparing a range of datasets where Texp either remained constant or was varied from 22 to 45 °C. Key findings Using data from flow‐through diffusion cells results in poor model performance. Data from static diffusion cells resulted in significantly greater performance. Inclusion of data from flow‐through cell experiments reduces overall model quality. Consideration of Texp improves model quality when the dataset used exhibits a wide range of experimental temperatures. Conclusions This study highlights the problem of collating literature data into datasets from which models are constructed without consideration of the nature of those data. In order to optimise model quality data from only static, Franz‐type, experiments should be used to construct the model and Texp should either be incorporated as a descriptor in the model if data are collated from a range of studies conducted at different temperatures.Peer reviewe
    corecore