Search CORE

11 research outputs found

Recommended from our members

Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction

Author: Bender A
Norinder U
Svensson F
Publication venue: Journal of Chemical Information and Modeling
Publication date: 14/02/2017
Field of study

High-throughput screening, where thousands of molecules rapidly can be assessed for activity against a protein, has been the dominating approach in drug discovery for many years. However, these methods are costly and require much time and effort. In order to suggest an improvement to this situation, in this study, we apply an iterative screening process, where an initial set of compounds are selected for screening based on molecular docking. The outcome of the initial screen is then used to classify the remaining compounds through a conformal predictor. The approach was retrospectively validated using 41 targets from the Directory of Useful Decoys, Enhanced (DUD-E), ensuring scaffold diversity among the active compounds. The results show that 57% of the remaining active compounds could be identified while only screening 9.4% of the database. The overall hit rate (7.6%) was also higher than when using docking alone (5.2%). When limiting the search to the top scored compounds from docking, 39.6% of the active compounds could be identified, compared to 13.5% when screening the same number of compounds solely based on docking. The use of conformal predictors also gives a clear indication of the number of compounds to screen in the next iteration. These results indicate that iterative screening based on molecular docking and conformal prediction can be an efficient way to find active compounds while screening only a small part of the compound collection.F.S. acknowledges the Swedish Pharmaceutical Society for financial support. The research at Swetox (UN) was supported by Stockholm County Council, Knut & Alice Wallenberg Foundation, and Swedish Research Council FORMAS

Apollo (Cambridge)

FigShare

Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction

Author: Svensson Fredrik
Norinder U
Bender Andreas
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 14/02/2017
Field of study

Biblioteca Digital de la Comunidad de Madrid

Apollo (Cambridge)

Recommended from our members

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

Author: Bender Andreas
Cortés-Ciriano Isidro
Firth Nicholas C
Watson Oliver
Publication venue: Journal of Chemical Information and Modeling
Publication date: 10/09/2018
Field of study

The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.This project has received funding from the European Union’s Framework Programme For Research and Innovation Horizon 2020 (2014–2020) under the Marie Curie Sklodowska-Curie Grant Agreement No. 703543 (I.C.-C.). A.B. thanks the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding. N.C.F is funded by EPSRC (EP/M006093/1)

Apollo (Cambridge)

FigShare

Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning

Author: Norinder U
Spjuth O
Svensson F
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2021
Field of study

Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox

UCL Discovery

LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity – Application to the Tox21 and Mutagenicity Datasets

Author: Mucs D
Norinder U
Svensson F
Zhang J
Publication venue: 'American Chemical Society (ACS)'
Publication date: 28/10/2019
Field of study

Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster-speed and lower-cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm inherited its high predictivity but resolved its scalability and long computational time by adopting leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity datasets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity datasets. We recommend LightGBM for applications in in silico safety assessment and also in other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related endpoints of large compound libraries present in the pharmaceutical and chemical industry

UCL Discovery

Maximizing gain in high-throughput screening using conformal prediction

Author: Avid AM
Bender Andreas
Norinder U
Svensson Fredrik
Publication venue: Journal of Cheminformatics
Publication date: 01/02/2018
Field of study

Iterative screening has emerged as a promising approach to increase the efficiency of screening campaigns compared to traditional high throughput approaches. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models, resulting in more efficient screening. One way to evaluate screening is to consider the cost of screening compared to the gain associated with finding an active compound. In this work, we introduce a conformal predictor coupled with a gain-cost function with the aim to maximise gain in iterative screening. Using this setup we were able to show that by evaluating the predictions on the training data, very accurate predictions on what settings will produce the highest gain on the test data can be made. We evaluate the approach on 12 bioactivity datasets from PubChem training the models using 20% of the data. Depending on the settings of the gain-cost function, the settings generating the maximum gain were accurately identified in 8–10 out of the 12 datasets. Broadly, our approach can predict what strategy generates the highest gain based on the results of the cost-gain evaluation: to screen the compounds predicted to be active, to screen all the remaining data, or not to screen any additional compounds. When the algorithm indicates that the predicted active compounds should be screened, our approach also indicates what confidence level to apply in order to maximize gain. Hence, our approach facilitates decision-making and allocation of the resources where they deliver the most value by indicating in advance the likely outcome of a screening campaign.The research at Swetox (UN) was supported by Knut and Alice Wallenberg Foundation and Swedish Research Council FORMAS. AMA was supported by AstraZeneca

Crossref

Directory of Open Access Journals

UCL Discovery

Apollo (Cambridge)

Bioinformatics in translational drug discovery

Author: Abraham
Adzhubei
Agüero
Akavia
Arcanjo
Aretz
Ashburn
Baeissa
Baeissa
Barzilai
Bradfield
Brown
Buchan
Bulusu
Campagna-Slater
Cantor
Cao
Chatr-Aryamontri
Cheang
Cheng
de Bakker
Dubois
Dudley
Duerr
Edfeldt
Egner
Fauman
Fontana
Frances M.G. Pearl
Gashaw
Gaulton
Ge
Ginn
Gonzalez
Gonzalez-Perez
González-Pérez
Graeme Benstead-Hume
Grossman
Guo
Hajduk
Hajduk
Halgren
Hartwell
Henderson
Henrich
Hermjakob
Hoadley
Hofree
Hopkins
Huang
Huang
Iorio
Jacobson
Josset
Kaelin
Kavanagh
Kellenberger
Khattab
Khoo
Kim
Kim
Klein
Kola
Lamb
Langhorne
Law
Lawrence
Le Guilloux
Lee
Leslie
Li
Li
Lindeman
Lipinski
Loging
MacArthur
Malaria Genomic Epidemiology Network
Menden
Mitsopoulos
Napolitano
Narayan
Nayal
Nguyen
Parnell
Paul
Pearl
Pe’er
Piñero
Pérot
Radusky
Reva
Riester
Rubio-Perez
Sarah K. Wooller
Schmidtke
Schmitt
Schwartz
Shameer
Shawver
Shepherd
Sheridan
Shihab
Sim
Spangenberg
Stinchcombe
Surendiran
Svensson
Szklarczyk
Szklarczyk
Tamborero
Thatcher
The Gene Ontology Consortium
Thellung
Thiers
Thorn
Tomczak
Urban
van Driel
Van Voorhis
Veber
Vidler
Visscher
Vogelstein
Vogelstein
Volkamer
Volkamer
Wang
Wellcome Trust Case Control Consortium
Wellcome Trust Case Control Consortium
Wenlock
Wong
Würth
Xiangrong Chen
Yamazaki
Ylä-Herttuala
Yusuf Ali
Zhang
Zhang
Zhang
Publication venue: 'Portland Press Ltd.'
Publication date: 09/05/2017
Field of study

Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse ‘big data’ that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications

Crossref

Sussex Research Online

KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development

Author: Achenbach Janosch H.
Buesen Roland
Landsiedel Robert
Mathea Miriam
Morger Andrea
Schleifer Klaus-Juergen
Volkamer Andrea
Wolf Antje
Publication venue
Publication date: 01/01/2020
Field of study

Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process

Institutional Repository of the Freie Universität Berlin