422 research outputs found
Enhancing the effectiveness of ligand-based virtual screening using data fusion
Data fusion is being increasingly used to combine the outputs of different types of sensor. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple similarity searches, although there is also increasing interest in the combination of multiple machine learning techniques. Both approaches are reviewed here, focusing on the extent to which fusion can improve the effectiveness of searching when compared with a single screening mechanism, and on the reasons that have been suggested for the observed performance enhancement
NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN
Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach
Virtual screening (VS) is widely used during computational drug discovery to
reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to
predict new compound-protein interactions (CPIs) from known CPI network data
using several methods, including machine learning and data mining. Although
CGBVS facilitates highly efficient and accurate CPI prediction, it has poor
performance for prediction of new compounds for which CPIs are unknown. The
pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high
accuracy for prediction of new compounds. In this study, on the basis of link
mining, we improved the PKM by combining link indicator kernel (LIK) and
chemical similarity and evaluated the accuracy of these methods. The proposed
method obtained an average area under the precision-recall curve (AUPR) value
of 0.562, which was higher than that achieved by the conventional Gaussian
interaction profile (GIP) method (0.425), and the calculation time was only
increased by a few percent
A Multiple Classifier System Identifies Novel Cannabinoid CB2 Receptor Ligands
open access articleDrugs have become an essential part of our lives due to their ability to improve people’s
health and quality of life. However, for many diseases, approved drugs are not yet available
or existing drugs have undesirable side effects, making the pharmaceutical industry strive to
discover new drugs and active compounds. The development of drugs is an expensive
process, which typically starts with the detection of candidate molecules (screening) for an
identified protein target. To this end, the use of high-performance screening techniques has
become a critical issue in order to palliate the high costs. Therefore, the popularity of
computer-based screening (often called virtual screening or in-silico screening) has rapidly
increased during the last decade. A wide variety of Machine Learning (ML) techniques has
been used in conjunction with chemical structure and physicochemical properties for
screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently
(iii) Multiple Classifier Systems (MCS). In this work, we apply an MCS for virtual screening
(D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid
CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine
(1.834.362 compounds), was virtually screened to identify 48.432 potential active molecules
using D2-MCS. This list was subsequently clustered based on circular fingerprints and from
each cluster, the most active compound was maintained. From these, the top 60 were kept,
and 21 novel compounds were purchased. Experimental validation confirmed six highly
active hits (>50% displacement at 10 μM and subsequent Ki determination) and an
additional five medium active hits (>25% displacement at 10 μM). D2-MCS hence provided a
hit rate of 29% for highly active compounds and an overall hit rate of 52%
SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces
Batch Bayesian optimisation and Bayesian quadrature have been shown to be
sample-efficient methods of performing optimisation and quadrature where
expensive-to-evaluate objective functions can be queried in parallel. However,
current methods do not scale to large batch sizes -- a frequent desideratum in
practice (e.g. drug discovery or simulation-based inference). We present a
novel algorithm, SOBER, which permits scalable and diversified batch global
optimisation and quadrature with arbitrary acquisition functions and kernels
over discrete and mixed spaces. The key to our approach is to reformulate batch
selection for global optimisation as a quadrature problem, which relaxes
acquisition function maximisation (non-convex) to kernel recombination
(convex). Bridging global optimisation and quadrature can efficiently solve
both tasks by balancing the merits of exploitative Bayesian optimisation and
explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive
baselines on 12 synthetic and diverse real-world tasks.Comment: 34 pages, 12 figure
Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification
Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy
Lo-Hi: Practical ML Drug Discovery Benchmark
Finding new drugs is getting harder and harder. One of the hopes of drug
discovery is to use machine learning models to predict molecular properties.
That is why models for molecular property prediction are being developed and
tested on benchmarks such as MoleculeNet. However, existing benchmarks are
unrealistic and are too different from applying the models in practice. We have
created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead
Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug
discovery process. For the Hi task, we designed a novel molecular splitting
algorithm that solves the Balanced Vertex Minimum -Cut problem. We tested
state-of-the-art and classic ML models, revealing which works better under
practical settings. We analyzed modern benchmarks and showed that they are
unrealistic and overoptimistic.
Review: https://openreview.net/forum?id=H2Yb28qGLV
Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023
Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitterComment: 29 pages, Advances in Neural Information Processing Systems, 202
- …