31 research outputs found
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Combining IC<sub>50</sub> or <i>K</i><sub><i>i</i></sub> Values from Different Sources Is a Source of Significant Noise
As part of the ongoing quest to find or construct large
data sets
for use in validating new machine learning (ML) approaches for bioactivity
prediction, it has become distressingly common for researchers to
combine literature IC50 data generated using different
assays into a single data set. It is well-known that there are many
situations where this is a scientifically risky thing to do, even
when the assays are against exactly the same target, but the risks
of assays being incompatible are even higher when pulling data from
large collections of literature data like ChEMBL. Here, we estimate
the amount of noise present in combined data sets using cases where
measurements for the same compound are reported in multiple assays
against the same target. This approach shows that IC50 assays
selected using minimal curation settings have poor agreement with
each other: almost 65% of the points differ by more than 0.3 log units,
27% differ by more than one log unit, and the correlation between
the assays, as measured by Kendall’s τ, is only 0.51.
Requiring that most of the assay metadata in ChEMBL matches (“maximal
curation”) in order to combine two assays improves the situation
(48% of the points differ by more than 0.3 log units, 13% by more
than one log unit, and Kendall’s τ is 0.71) at the expense
of having smaller data sets. Surprisingly, our analysis shows similar
amounts of noise when combining data from different literature Ki assays. We suggest that good
scientific practice requires careful curation when combining data
sets from different assays and hope that our maximal curation strategy
will help to improve the quality of the data that are being used to
build and validate ML models for bioactivity prediction. To help achieve
this, the code and ChEMBL queries that we used for the maximal curation
approach are available as open-source software in our GitHub repository, https://github.com/rinikerlab/overlapping_assays
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation
Small
organic molecules are often flexible, i.e., they can adopt
a variety of low-energy conformations in solution that exist in equilibrium
with each other. Two main search strategies are used to generate representative
conformational ensembles for molecules: systematic and stochastic.
In the first approach, each rotatable bond is sampled systematically
in discrete intervals, limiting its use to molecules with a small
number of rotatable bonds. Stochastic methods, on the other hand,
sample the conformational space of a molecule randomly and can thus
be applied to more flexible molecules. Different methods employ different
degrees of experimental data for conformer generation. So-called knowledge-based
methods use predefined libraries of torsional angles and ring conformations.
In the distance geometry approach, on the other hand, a smaller amount
of empirical information is used, i.e., ideal bond lengths, ideal
bond angles, and a few ideal torsional angles. Distance geometry is
a computationally fast method to generate conformers, but it has the
downside that purely distance-based constraints tend to lead to distorted
aromatic rings and sp<sup>2</sup> centers. To correct this, the resulting
conformations are often minimized with a force field, adding computational
complexity and run time. Here we present an alternative strategy that
combines the distance geometry approach with experimental torsion-angle
preferences obtained from small-molecule crystallographic data. The
torsional angles are described by a previously developed set of hierarchically
structured SMARTS patterns. The new approach is implemented in the
open-source cheminformatics library RDKit, and its performance is
assessed by comparing the diversity of the generated ensemble and
the ability to reproduce crystal conformations taken from the crystal
structures of small molecules and protein–ligand complexes
Correction to “Machine Learning of Partial Charges Derived From High-Quality Quantum-Mechanical Calculations”
Correction to “Machine Learning of Partial
Charges Derived From High-Quality Quantum-Mechanical Calculations
Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations
Parametrization of small organic
molecules for classical molecular
dynamics simulations is not trivial. The vastness of the chemical
space makes approaches using building blocks challenging. The most
common approach is therefore an individual parametrization of each
compound by deriving partial charges from semiempirical or ab initio
calculations and inheriting the bonded and van der Waals (Lennard-Jones)
parameters from a (bio)Âmolecular force field. The quality of the partial
charges generated in this fashion depends on the level of the quantum-chemical
calculation as well as on the extraction procedure used. Here, we
present a machine learning (ML) based approach for predicting partial
charges extracted from density functional theory (DFT) electron densities.
The training set was chosen with the goal to provide a broad coverage
of the known chemical space of druglike molecules. In addition to
the speed of the approach, the partial charges predicted by ML are
not dependent on the three-dimensional conformation in contrast to
the ones obtained by fitting to the electrostatic potential (ESP).
To assess the quality and compatibility with standard force fields,
we performed benchmark calculations for the free energy of hydration
and liquid properties such as density and heat of vaporization