2 research outputs found

    Krein support vector machine classification of antimicrobial peptides

    Get PDF
    Antimicrobial peptides (AMPs) represent a potential solution to the growing problem of antimicrobial resistance, yet their identification through wet-lab experiments is a costly and time-consuming process. Accurate computational predictions would allow rapid in silico screening of candidate AMPs, thereby accelerating the discovery process. Kernel methods are a class of machine learning algorithms that utilise a kernel function to transform input data into a new representation. When appropriately normalised, the kernel function can be regarded as a notion of similarity between instances. However, many expressive notions of similarity are not valid kernel functions, meaning they cannot be used with standard kernel methods such as the support-vector machine (SVM). The Kreĭn-SVM represents generalisation of the standard SVM that admits a much larger class of similarity functions. In this study, we propose and develop Kreĭn-SVM models for AMP classification and prediction by employing the Levenshtein distance and local alignment score as sequence similarity functions. Utilising two datasets from the literature, each containing more than 3000 peptides, we train models to predict general antimicrobial activity. Our best models achieve an AUC of 0.967 and 0.863 on the test sets of each respective dataset, outperforming the in-house and literature baselines in both cases. We also curate a dataset of experimentally validated peptides, measured against Staphylococcus aureus and Pseudomonas aeruginosa, in order to evaluate the applicability of our methodology in predicting microbe-specific activity. In this case, our best models achieve an AUC of 0.982 and 0.891, respectively. Models to predict both general and microbe-specific activities are made available as web applications

    Regression modelling using priors depending on Fisher information covariance kernels (I-priors)

    Get PDF
    Regression analysis is undoubtedly an important tool to understand the relationship between one or more explanatory and independent variables of interest. In this thesis, we explore a novel methodology for fitting a wide range of parametric and nonparametric regression models, called the I-prior methodology (Bergsma, 2018). We assume that the regression function belongs to a reproducing kernel Hilbert or Kreĭn space of functions, and by doing so, allows us to utilise the convenient topologies of these vector spaces. This is important for the derivation of the Fisher information of the regression function, which might be infinite dimensional. Based on the principle of maximum entropy, an I-prior is an objective Gaussian process prior for the regression function with covariance function proportional to its Fisher information. Our work focusses on the statistical methodology and computational aspects of fitting I-priors models. We examine a likelihood-based approach (direct optimisation and EM algorithm) for fitting I-prior models with normally distributed errors. The culmination of this work is the R package iprior (Jamil, 2017) which has been made publicly available on CRAN. The normal I-prior methodology is subsequently extended to fit categorical response models, achieved by “squashing” the regression functions through a probit sigmoid function. Estimation of I-probit models, as we call it, proves challenging due to the intractable integral involved in computing the likelihood. We overcome this difficulty by way of variational approximations. Finally, we turn to a fully Bayesian approach of variable selection using I-priors for linear models to tackle multicollinearity. We illustrate the use of I-priors in various simulated and real-data examples. Our study advocates the I-prior methodology as being a simple, intuitive, and comparable alternative to similar leading state-of-the-art models
    corecore