22 research outputs found
Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition
Inspired by natural
language processing techniques, we here introduce
Mol2vec, which is an unsupervised machine learning approach to learn
vector representations of molecular substructures. Like the Word2vec
models, where vectors of closely related words are in close proximity
in the vector space, Mol2vec learns vector representations of molecular
substructures that point in similar directions for chemically related
substructures. Compounds can finally be encoded as vectors by summing
the vectors of the individual substructures and, for instance, be
fed into supervised machine learning approaches to predict compound
properties. The underlying substructure vector embeddings are obtained
by training an unsupervised machine learning approach on a so-called
corpus of compounds that consists of all available chemical matter.
The resulting Mol2vec model is pretrained once, yields dense vector
representations, and overcomes drawbacks of common compound feature
representations such as sparseness and bit collisions. The prediction
capabilities are demonstrated on several compound property and bioactivity
data sets and compared with results obtained for Morgan fingerprints
as a reference compound representation. Mol2vec can be easily combined
with ProtVec, which employs the same Word2vec concept on protein sequences,
resulting in a proteochemometric approach that is alignment-independent
and thus can also be easily used for proteins with low sequence similarities
From Cancer to Pain Target by Automated Selectivity Inversion of a Clinical Candidate
Elimination of inadvertent binding
is crucial for inhibitor design
targeting conserved protein classes like kinases. Compounds in clinical
trials provide a rich source for initiating drug design efforts by
exploiting such secondary binding events. Considering both aspects,
we shifted the selectivity of tozasertib, originally developed against
AurA as cancer target, toward the pain target TrkA. First, selectivity-determining
features in binding pockets were identified by fusing interaction
grids of several key and off-target conformations. A focused library
was subsequently created and prioritized using a multiobjective selection
scheme that filters for selective and highly active compounds based
on orthogonal methods grounded in computational chemistry and machine
learning. Eighteen high-ranking compounds were synthesized and experimentally
tested. The top-ranked compound has 10000-fold improved selectivity
versus AurA, nanomolar cellular activity, and is highly selective
in a kinase panel. This was achieved in a single round of automated
in silico optimization, highlighting the power of recent advances
in computer-aided drug design to automate design and selection processes
From Cancer to Pain Target by Automated Selectivity Inversion of a Clinical Candidate
Elimination of inadvertent binding
is crucial for inhibitor design
targeting conserved protein classes like kinases. Compounds in clinical
trials provide a rich source for initiating drug design efforts by
exploiting such secondary binding events. Considering both aspects,
we shifted the selectivity of tozasertib, originally developed against
AurA as cancer target, toward the pain target TrkA. First, selectivity-determining
features in binding pockets were identified by fusing interaction
grids of several key and off-target conformations. A focused library
was subsequently created and prioritized using a multiobjective selection
scheme that filters for selective and highly active compounds based
on orthogonal methods grounded in computational chemistry and machine
learning. Eighteen high-ranking compounds were synthesized and experimentally
tested. The top-ranked compound has 10000-fold improved selectivity
versus AurA, nanomolar cellular activity, and is highly selective
in a kinase panel. This was achieved in a single round of automated
in silico optimization, highlighting the power of recent advances
in computer-aided drug design to automate design and selection processes
Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization
Matched molecular
pair (MMP) analyses are widely used in compound
optimization projects to gain insights into structure–activity
relationships (SAR). The analysis is traditionally done via statistical
methods but can also be employed together with machine learning (ML)
approaches to extrapolate to novel compounds. The here introduced
MMP/ML method combines a fragment-based MMP implementation with different
machine learning methods to obtain automated SAR decomposition and
prediction. To test the prediction capabilities and model transferability,
two different compound optimization scenarios were designed: (1) “new
fragments” which occurs when exploring new fragments for a
defined compound series and (2) “new static core and transformations”
which resembles for instance the identification of a new compound
series. Very good results were achieved by all employed machine learning
methods especially for the new fragments case, but overall deep neural
network models performed best, allowing reliable predictions also
for the new static core and transformations scenario, where comprehensive
SAR knowledge of the compound series is missing. Furthermore, we show
that models trained on all available data have a higher generalizability
compared to models trained on focused series and can extend beyond
chemical space covered in the training data. Thus, coupling MMP with
deep neural networks provides a promising approach to make high quality
predictions on various data sets and in different compound optimization
scenarios
From Cancer to Pain Target by Automated Selectivity Inversion of a Clinical Candidate
Elimination of inadvertent binding
is crucial for inhibitor design
targeting conserved protein classes like kinases. Compounds in clinical
trials provide a rich source for initiating drug design efforts by
exploiting such secondary binding events. Considering both aspects,
we shifted the selectivity of tozasertib, originally developed against
AurA as cancer target, toward the pain target TrkA. First, selectivity-determining
features in binding pockets were identified by fusing interaction
grids of several key and off-target conformations. A focused library
was subsequently created and prioritized using a multiobjective selection
scheme that filters for selective and highly active compounds based
on orthogonal methods grounded in computational chemistry and machine
learning. Eighteen high-ranking compounds were synthesized and experimentally
tested. The top-ranked compound has 10000-fold improved selectivity
versus AurA, nanomolar cellular activity, and is highly selective
in a kinase panel. This was achieved in a single round of automated
in silico optimization, highlighting the power of recent advances
in computer-aided drug design to automate design and selection processes
Identification and Visualization of Kinase-Specific Subpockets
The identification
and design of selective compounds is important
for the reduction of unwanted side effects as well as for the development
of tool compounds for target validation studies. This is, in particular,
true for therapeutically important protein families that possess conserved
folds and have numerous members such as kinases. To support the design
of selective kinase inhibitors, we developed a novel approach that
allows identification of specificity determining subpockets between
closely related kinases solely based on their three-dimensional structures.
To account for the intrinsic flexibility of the proteins, multiple
X-ray structures of the target protein of interest as well as of unwanted
off-target(s) are taken into account. The binding pockets of these
protein structures are calculated and fused to a combined target and
off-target pocket, respectively. Subsequently, shape differences between
these two combined pockets are identified via fusion rules. The approach
provides a user-friendly visualization of target-specific areas in
a binding pocket which should be explored when designing selective
compounds. Furthermore, the approach can be easily combined with in
silico alanine mutation studies to identify selectivity determining
residues. The potential impact of the approach is demonstrated in
four retrospective experiments on closely related kinases, i.e., p38α
vs Erk2, PAK1 vs PAK4, ITK vs AurA, and BRAF vs VEGFR2. Overall, the
presented approach does not require any profiling data for training
purposes, provides an intuitive visualization of a large number of
protein structures at once, and could also be applied to other target
classes
Additional file 1: of KinMap: a web-based tool for interactive navigation through human kinome data
(KinMap_Examples.zip) contains the input CSV files used to generate the annotated kinome trees in Fig. 1 (Example_1_Erlotinib_NSCLC.csv), Fig. 2a (Example_2_Sunitinib_Sorafenib_Cancer.csv), and Fig. 2b (Example_3_Kinase_Stats.csv). (ZIP 5 kb
Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay
Kinome-wide
screening would have the advantage of providing structure–activity
relationships against hundreds of targets simultaneously. Here, we
report the generation of ligand-based activity prediction models for
over 280 kinases by employing Machine Learning methods on an extensive
data set of proprietary bioactivity data combined with open data.
High quality (AUC > 0.7) was achieved for ∼200 kinases by
(1)
combining open with proprietary data, (2) choosing Random Forest over
alternative tested Machine Learning methods, and (3) balancing the
training data sets. Tests on left-out and external data indicate a
high value for virtual screening projects. Importantly, the derived
models are evenly distributed across the kinome tree, allowing reliable
profiling prediction for all kinase branches. The prediction quality
was further improved by employing experimental bioactivity fingerprints
of a small kinase subset. Overall, the generated models can support
various hit identification tasks, including virtual screening, compound
repurposing, and the detection of potential off-targets
Function of the d‑Alanine:d‑Alanine Ligase Lid Loop: A Molecular Modeling and Bioactivity Study
d-Alanine:d-alanine ligase (Ddl) is
an essential
ATP-dependent bacterial enzyme involved in peptidoglycan biosynthesis.
Discovery of Ddl inhibitors not competitive with ATP has proven to
be difficult because the Ddl bimolecular d-alanine binding
pocket is very restricted, as is accessibility to the active site
for larger molecules in the catalytically active closed conformation
of Ddl. A molecular dynamics study of the opening and closing of the
Ddl lid loop informs future structure-based design efforts that allow
for the flexibility of Ddl. A virtual screen on generated enzyme conformations
yielded some hit inhibitors whose bioactivity was determined
Selective Inhibitors of Aldo-Keto Reductases AKR1C1 and AKR1C3 Discovered by Virtual Screening of a Fragment Library
Human aldo-keto reductases 1C1–1C4 (AKR1C1–AKR1C4)
function in vivo as 3-keto-, 17-keto-, and 20-ketosteroid reductases
and regulate the activity of androgens, estrogens, and progesterone
and the occupancy and transactivation of their corresponding receptors.
Aberrant expression and action of AKR1C enzymes can lead to different
pathophysiological conditions. AKR1C enzymes thus represent important
targets for development of new drugs. We performed a virtual high-throughput
screen of a fragment library that was followed by biochemical evaluation
on AKR1C1–AKR1C4 enzymes. Twenty-four structurally diverse
compounds were discovered with low μM <i>K</i><sub>i</sub> values for AKR1C1, AKR1C3, or both. Two structural series
included the salicylates and the <i>N</i>-phenylanthranilic
acids, and additionally a series of inhibitors with completely novel
scaffolds was discovered. Two of the best selective AKR1C3 inhibitors
had <i>K</i><sub>i</sub> values of 0.1 and 2.7 μM,
exceeding expected activity for fragments. The compounds identified
represent an excellent starting point for further hit-to-lead development