1,641 research outputs found
Using a Genetic Algorithm to Find Molecules with Good Docking Scores
A graph-based genetic algorithm (GA) is used to identify molecules (ligands) with high absolute docking scores as estimated by the Glide software package, starting from randomly chosen molecules from the ZINC database, for four different targets: Bacillus subtilis chorismate mutase (CM), human β2-adrenergic G protein-coupled receptor (β2AR), the DDR1 kinase domain (DDR1), and β-cyclodextrin (BCD). By the combined use of functional group filters and a score modifier based on a heuristic synthetic accessibility (SA) score our approach identifies between ca 500 and 6,000 structurally diverse molecules with scores better than known binders by screening a total of 400,000 molecules starting from 8,000 randomly selected molecules from the ZINC database. Screening 250,000 molecules from the ZINC database identifies significantly more molecules with better docking scores than known binders, with the exception of CM, where the conventional screening approach only identifies 60 compounds compared to 511 with GA+Filter+SA. In the case of β2AR and DDR1, the GA+Filter+SA approach finds significantly more molecules with docking scores lower than −9.0 and −10.0. The GA+Filters+SA docking methodology is thus effective in generating a large and diverse set of synthetically accessible molecules with very good docking scores for a particular target. An early incarnation of the GA+Filter+SA approach was used to identify potential binders to the COVID-19 main protease and submitted to the early stages of the COVID Moonshot project, a crowd-sourced initiative to accelerate the development of a COVID antiviral
Anthropogenic reaction parameters - the missing link between chemical intuition and the available chemical space
How do skilled synthetic chemists develop
such a good intuitive expertise
?
Why can we
only access such a small amount of the available chemical space
—
both in terms of the
re
actions used and the chemical scaffolds we make?
We argue here that these
seemingly
unrelated
questions
have a common root and
are strongly
interdependent
.
We performed a
comprehensive analysis of organic reaction parameters dating back to 1771 and discove
red
that
there are several
anthropogenic
factors
that limit the
reaction parameters and thus the
scop
e of synthetic
chemistry.
Nevertheless,
many of the anthropogenic limitations such as
the
narrow parameter space and the opportunity of the rapid and clear
feedback on the progress of
reactions appear to be crucial for the acquisition of valid and reliable chemical intuition.
In
parallel, however, all of these
same
factors represent limitations
for the
exploration of
available chemistry space and
we argue
th
at these
are thus at least partly responsible for
limited access to new chemistries. We advocate, therefore, that the present
anthropogenic
boundaries can be expanded by a more conscious expl
oration of “off
-
road” chemistry that
would also
extend the intuit
ive knowledge of trained chemists
The Synthesizability of Molecules Proposed by Generative Models
The discovery of functional molecules is an expensive and time-consuming
process, exemplified by the rising costs of small molecule therapeutic
discovery. One class of techniques of growing interest for early-stage drug
discovery is de novo molecular generation and optimization, catalyzed by the
development of new deep learning approaches. These techniques can suggest novel
molecular structures intended to maximize a multi-objective function, e.g.,
suitability as a therapeutic against a particular target, without relying on
brute-force exploration of a chemical space. However, the utility of these
approaches is stymied by ignorance of synthesizability. To highlight the
severity of this issue, we use a data-driven computer-aided synthesis planning
program to quantify how often molecules proposed by state-of-the-art generative
models cannot be readily synthesized. Our analysis demonstrates that there are
several tasks for which these models generate unrealistic molecular structures
despite performing well on popular quantitative benchmarks. Synthetic
complexity heuristics can successfully bias generation toward
synthetically-tractable chemical space, although doing so necessarily detracts
from the primary objective. This analysis suggests that to improve the utility
of these models in real discovery workflows, new algorithm development is
warranted
Rethinking drug design in the artificial intelligence era
Artificial intelligence (AI) tools are increasingly being applied in drug discovery. While some protagonists point to vast opportunities potentially offered by such tools, others remain sceptical, waiting for a clear impact to be shown in drug discovery projects. The reality is probably somewhere in-between these extremes, yet it is clear that AI is providing new challenges not only for the scientists involved but also for the biopharma industry and its established processes for discovering and developing new medicines. This article presents the views of a diverse group of international experts on the 'grand challenges' in small-molecule drug discovery with AI and the approaches to address them
Computer Aided Synthesis Prediction to Enable Augmented Chemical Discovery and Chemical Space Exploration
The drug-like chemical space is estimated to be 10 to the power of 60 molecules, and the largest generated database (GDB) obtained by the Reymond group is 165 billion molecules with up to 17 heavy atoms. Furthermore, deep learning techniques to explore regions of chemical space are becoming more popular. However, the key to realizing the generated structures experimentally lies in chemical synthesis. The application of which was previously limited to manual planning or slow computer assisted synthesis planning (CASP) models. Despite the 60-year history of CASP few synthesis planning tools have been open-sourced to the community. In this thesis I co-led the development of and investigated one of the only fully open-source synthesis planning tools called AiZynthFinder, trained on both public and proprietary datasets consisting of up to 17.5 million reactions. This enables synthesis guided exploration of the chemical space in a high throughput manner, to bridge the gap between compound generation and experimental realisation.
I firstly investigate both public and proprietary reaction data, and their influence on route finding capability. Furthermore, I develop metrics for assessment of retrosynthetic prediction, single-step retrosynthesis models, and automated template extraction workflows. This is supplemented by a comparison of the underlying datasets and their corresponding models.
Given the prevalence of ring systems in the GDB and wider medicinal chemistry domain, I developed ‘Ring Breaker’ - a data-driven approach to enable the prediction of ring-forming reactions. I demonstrate its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. Additionally, I highlight its potential for incorporation into CASP tools, and outline methodological improvements that result in the improvement of route-finding capability.
To tackle the challenge of model throughput, I report a machine learning (ML) based classifier called the retrosynthetic accessibility score (RAscore), to assess the likelihood of finding a synthetic route using AiZynthFinder. The RAscore computes at least 4,500 times faster than AiZynthFinder. Thus, opens the possibility of pre-screening millions of virtual molecules from enumerated databases or generative models for synthesis informed compound prioritization.
Finally, I combine chemical library visualization with synthetic route prediction to facilitate experimental engagement with synthetic chemists. I enable the navigation of chemical property space by using interactive visualization to deliver associated synthetic data as endpoints. This aids in the prioritization of compounds. The ability to view synthetic route information alongside structural descriptors facilitates a feedback mechanism for the improvement of CASP tools and enables rapid hypothesis testing. I demonstrate the workflow as applied to the GDB databases to augment compound prioritization and synthetic route design
Going Small: Using Biophysical Screening to Implement Fragment Based Drug Discovery
Screening against biochemical targets with compact chemical fragments has developed a reputation as a successful early‐stage drug discovery approach, thanks to recent drug approvals. Having weak initial target affinities, fragments require the use of sensitive biophysical technologies (NMR, SPR, thermal shift, ITC, and X‐ray crystallography) to accommodate the practical limits of going smaller. Application of optimized fragment biophysical screening approaches now routinely allows for the rapid identification of fragments with high binding efficiencies. The aim of this chapter is to provide an introduction to fragment library selection and to discuss the suitability of screening approaches adapted for lower‐throughput biophysical techniques. A general description of metrics that are being used in the progression of fragment hits, the need for orthogonal assay testing, and guidance on potential pitfalls are included to assist scientists, considering initiating their own fragment discovery program
- …