26 research outputs found
Towards predictive resistance models for agrochemicals by combining chemical and protein similarity via proteochemometric modelling
Resistance to pesticides is an increasing problem in agriculture. Despite practices such as phased use and cycling of âorthogonally resistantâ agents, resistance remains a major risk to national and global food security. To combat this problem, there is a need for both new approaches for pesticide design, as well as for novel chemical entities themselves. As summarized in this opinion article, a technique termed âproteochemometric modellingâ (PCM), from the field of chemoinformatics, could aid in the quantification and prediction of resistance that acts via point mutations in the target proteins of an agent. The technique combines information from both the chemical and biological domain to generate bioactivity models across large numbers of ligands as well as protein targets. PCM has previously been validated in prospective, experimental work in the medicinal chemistry area, and it draws on the growing amount of bioactivity information available in the public domain. Here, two potential applications of proteochemometric modelling to agrochemical data are described, based on previously published examples from the medicinal chemistry literature.FWN â Publicaties zonder aanstelling Universiteit Leide
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets.
Background
While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants.
Results
The amino acid descriptor sets compared here show similar performance ( 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last.
Conclusions
While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still â on average â surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side
Advances in GPCR Modeling Evaluated by the GPCR Dock 2013 Assessment: Meeting New Challenges
Despite tremendous successes of GPCR crystallography, the receptors with available structures represent only a small fraction of human GPCRs. An important role of the modeling community is to maximize structural insights for the remaining receptors and complexes. The community-wide GPCR Dock assessment was established to stimulate and monitor the progress in molecular modeling and ligand docking for GPCRs. The four targets in the present third assessment round presented new and diverse challenges for modelers, including prediction of allosteric ligand interaction and activation states in 5-hydroxytryptamine receptors 1B and 2B, and modeling by extremely distant homology for smoothened receptor. Forty-four modeling groups participated in the assessment. State-of-the-art modeling approaches achieved close-to-experimental accuracy for small rigid orthosteric ligands and models built by close homology, and they correctly predicted protein fold for distant homology targets. Predictions of long loops and GPCR activation states remain unsolved problems
A document classifier for medicinal chemistry publications trained on the ChEMBL corpus
BackgroundÂ
The large increase in the number of scientific publications has fuelled a need for semi- and fully automated text mining approaches in order to assist in the triage process, both for individual scientists and also for larger-scale data extraction and curation into public databases. Here, we introduce a document classifier, which is able to successfully distinguish between publications that are âChEMBL-likeâ (i.e. related to small molecule drug discovery and likely to contain quantitative bioactivity data) and those that are not. The unprecedented size of the medicinal chemistry literature collection, coupled with the advantage of manual curation and mapping to chemistry and biology make the ChEMBL corpus a unique resource for text mining.Â
ResultsÂ
The method has been implemented as a data protocol/workflow for both Pipeline Pilot (version 8.5) and KNIME (version 2.9) respectively. Both workflows and models are freely available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/text-mining webcite. These can be readily modified to include additional keyword constraints to further focus searches.Â
ConclusionsÂ
Large-scale machine learning document classification was shown to be very robust and flexible for this particular application, as illustrated in four distinct text-mining-based use cases. The models are readily available on two data workflow platforms, which we believe will allow the majority of the scientific community to apply them to their own data.FWN â Publicaties zonder aanstelling Universiteit Leide
Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling
Proteochemometric modeling (PCM) is a computational approach that can be considered an extension of quantitative structureâactivity relationship (QSAR) modeling, where a single model incorporates information for a family of targets and all the associated ligands instead of modeling activity versus one target. This is especially useful for situations where bioactivity data exists for similar proteins but is scarce for the protein of interest. Here we demonstrate the application of PCM to identify allosteric modulators of metabotropic glutamate (mGlu) receptors. Given our long-running interest in modulating mGlu receptor function we compiled a matrix of compound-target bioactivity data. Some members of the mGlu family are well explored both internally and in the public domain, while there are much fewer examples of ligands for other targets such as the mGlu7Â receptor. Using a PCM approach mGlu7Â receptor hits were found. In comparison to conventional single target modeling the identified hits were more diverse, had a better confirmation rate, and provide starting points for further exploration. We conclude that the robust structureâactivity relationship from well explored target family members translated to better quality hits for PCM compared to virtual screening (VS) based on a single target.FWN â Publicaties zonder aanstelling Universiteit Leide
Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel
Medicinal Chemistr
MyChEMBL: A Virtual Platform for Distributing Cheminformatics Tools and Open Data
MyChEMBL is an open virtual platform which provides a free, secure, standardised and easy to use chemoinformatics environment for bioactivity data mining, machine learning, application development, learning and teaching. The main technical features of myChEMBL along with its applications and future plans are discussed here.FWN â Publicaties zonder aanstelling Universiteit Leide
Supervised selective kernel fusion for membrane protein prediction
Membrane protein prediction is a significant classification problem, requiring the integration of data derived from different sources such as protein sequences, gene expression, protein interactions etc. A generalized probabilistic approach for combining different data sources via supervised selective kernel fusion was proposed in our previous papers. It includes, as particular cases, SVM, Lasso SVM, Elastic Net SVM and others. In this paper we apply a further instantiation of this approach, the Supervised Selective Support Kernel SVM and demonstrate that the proposed approach achieves the top-rank position among the selective kernel fusion variants on benchmark data for membrane protein prediction. The method differs from the previous approaches in that it naturally derives a subset of âsupport kernelsâ (analogous to support objects within SVMs), thereby allowing the memory-efficient exclusion of significant numbers of irrelevant kernel matrixes from a decision rule in a manner particularly suited to membrane protein prediction
A large-scale crop protection bioassay data set
Medicinal Chemistr
Comprehensive characterization of the Published Kinase Inhibitor Set
Despite the success of protein kinase inhibitors as approved therapeutics, drug discovery has focused on a small subset of kinase targets. Here we provide a thorough characterization of the Published Kinase Inhibitor Set (PKIS), a set of 367 small-molecule ATP-competitive kinase inhibitors that was recently made freely available with the aim of expanding research in this field and as an experiment in open-source target validation. We screen the set in activity assays with 224 recombinant kinases and 24 G protein-coupled receptors and in cellular assays of cancer cell proliferation and angiogenesis. We identify chemical starting points for designing new chemical probes of orphan kinases and illustrate the utility of these leads by developing a selective inhibitor for the previously untargeted kinases LOK and SLK. Our cellular screens reveal compounds that modulate cancer cell growth and angiogenesis in vitro. These reagents and associated data illustrate an efficient way forward to increasing understanding of the historically untargeted kinome