13 research outputs found
Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features.
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.Q.A. thanks the Islamic Development Bank and Cambridge Commonwealth Trust for Funding. O.M.L. is grateful to CONACyT (No. 217442/312933) and the Cambridge Overseas Trust for funding. G.v.W. thanks EMBL 90 (EIPOD) and Marie Curie (COFUND) for funding. A.B. thanks Unilever and the ERC (Starting Grant RC-2013-StG 336159 MIXTURE) for funding. ICC thanks the Institut Pasteur and the Pasteur-Paris International PhD programme for funding. TM thanks the Institut Pasteur for funding.This is the final version of the article. It first appeared from the Royal Society of Chemistry via http://dx.doi.org/10.1039/C4IB00175
Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules.
BACKGROUND: In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. RESULTS: camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). CONCLUSIONS: Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package
Unifying view of mechanical and functional hotspots across class A GPCRs
G protein-coupled receptors (GPCRs) are the largest superfamily of signaling proteins. Their activation process is accompanied by conformational changes that have not yet been fully uncovered. Here, we carry out a novel comparative analysis of internal structural fluctuations across a variety of receptors from class A GPCRs, which currently has the richest structural coverage. We infer the local mechanical couplings underpinning the receptors' functional dynamics and finally identify those amino acids whose virtual deletion causes a significant softening of the mechanical network. The relevance of these amino acids is demonstrated by their overlap with those known to be crucial for GPCR function, based on static structural criteria. The differences with the latter set allow us to identify those sites whose functional role is more clearly detected by considering dynamical and mechanical properties. Of these sites with a genuine mechanical/dynamical character, the top ranking is amino acid 7x52, a previously unexplored, and experimentally verifiable key site for GPCR conformational response to ligand binding. \ua9 2017 Ponzoni et al
Predictive Resistance Models Combining Chemical and Target Similarity
<p>Poster presented at the RSC BMCS & AgriNet Agriscience Symposium: Biological and Chemical Approaches Towards Combating Resistance in Agriculture(http://www.rsc.org/ConferencesAndEvents/conference/alldetails.cfm?evid=113227)</p
An Exploration Strategy Improves the Diversity of de novo Ligands Using Deep Reinforcement Learning – A Case for the Adenosine A2A Receptor
Over the last five
years deep learning has progressed tremendously in both image recognition and
natural language processing. Now it is increasingly applied to other data rich
fields. In drug discovery, recurrent neural networks (RNNs) have been shown to
be an effective method to generate novel chemical structures in the form of
SMILES. However, ligands generated by current methods have so far provided relatively low
diversity and do not fully cover the whole chemical
space occupied by known ligands. Here, we propose a new method (DrugEx) to
discover de novo drug-like molecules.
DrugEx is an RNN model (generator) trained through reinforcement learning which
was integrated with a special exploration strategy. As a case study we applied
our method to design ligands against the adenosine A2A receptor.
From ChEMBL data, a machine learning model (predictor) was created to predict
whether generated molecules are active or not. Based on this predictor as the
reward function, the generator was trained by reinforcement learning without
any further data. We then compared the performance of our method with two
previously published methods, REINVENT and ORGANIC. We found that candidate
molecules our model designed, and predicted to be active, had a larger chemical
diversity, and better covered the chemical space of known ligands compared to
the state-of-the-art.</p
RESEARCH ARTICLE Proteochemometric modeling in a Bayesian framework Open Access
Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R2 0 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results
PKD & Machine Learning
Proposal Submitted September 15 2017 to Amazon Research Award
An automated document classifier to retrieve ChEMBL - like papers
<p>Poster presented at the 10th ICCS (http://www.int-conf-chem-structures.org/)</p