Search CORE

31 research outputs found

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

Small organic molecules are often flexible, i.e., they can adopt a variety of low-energy conformations in solution that exist in equilibrium with each other. Two main search strategies are used to generate representative conformational ensembles for molecules: systematic and stochastic. In the first approach, each rotatable bond is sampled systematically in discrete intervals, limiting its use to molecules with a small number of rotatable bonds. Stochastic methods, on the other hand, sample the conformational space of a molecule randomly and can thus be applied to more flexible molecules. Different methods employ different degrees of experimental data for conformer generation. So-called knowledge-based methods use predefined libraries of torsional angles and ring conformations. In the distance geometry approach, on the other hand, a smaller amount of empirical information is used, i.e., ideal bond lengths, ideal bond angles, and a few ideal torsional angles. Distance geometry is a computationally fast method to generate conformers, but it has the downside that purely distance-based constraints tend to lead to distorted aromatic rings and sp2 centers. To correct this, the resulting conformations are often minimized with a force field, adding computational complexity and run time. Here we present an alternative strategy that combines the distance geometry approach with experimental torsion-angle preferences obtained from small-molecule crystallographic data. The torsional angles are described by a previously developed set of hierarchically structured SMARTS patterns. The new approach is implemented in the open-source cheminformatics library RDKit, and its performance is assessed by comparing the diversity of the generated ensemble and the ability to reproduce crystal conformations taken from the crystal structures of small molecules and protein–ligand complexes

FigShare

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

FigShare

Combining IC50 or Ki Values from Different Sources Is a Source of Significant Noise

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date: 23/02/2024
Field of study

As part of the ongoing quest to find or construct large data sets for use in validating new machine learning (ML) approaches for bioactivity prediction, it has become distressingly common for researchers to combine literature IC50 data generated using different assays into a single data set. It is well-known that there are many situations where this is a scientifically risky thing to do, even when the assays are against exactly the same target, but the risks of assays being incompatible are even higher when pulling data from large collections of literature data like ChEMBL. Here, we estimate the amount of noise present in combined data sets using cases where measurements for the same compound are reported in multiple assays against the same target. This approach shows that IC50 assays selected using minimal curation settings have poor agreement with each other: almost 65% of the points differ by more than 0.3 log units, 27% differ by more than one log unit, and the correlation between the assays, as measured by Kendall’s τ, is only 0.51. Requiring that most of the assay metadata in ChEMBL matches (“maximal curation”) in order to combine two assays improves the situation (48% of the points differ by more than 0.3 log units, 13% by more than one log unit, and Kendall’s τ is 0.71) at the expense of having smaller data sets. Surprisingly, our analysis shows similar amounts of noise when combining data from different literature Ki assays. We suggest that good scientific practice requires careful curation when combining data sets from different assays and hope that our maximal curation strategy will help to improve the quality of the data that are being used to build and validate ML models for bioactivity prediction. To help achieve this, the code and ChEMBL queries that we used for the maximal curation approach are available as open-source software in our GitHub repository, https://github.com/rinikerlab/overlapping_assays

FigShare

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

FigShare

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

FigShare

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

FigShare

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation

Author: Gregory A. Landrum (1450801)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

FigShare

Correction to “Machine Learning of Partial Charges Derived From High-Quality Quantum-Mechanical Calculations”

Author: Kay Schaller (4918108)
Patrick Bleiziffer (1959418)
Sereina Riniker (1450804)
Publication venue: 'American Chemical Society (ACS)'
Publication date: 20/03/2023
Field of study

Correction to “Machine Learning of Partial Charges Derived From High-Quality Quantum-Mechanical Calculations

FigShare

Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations

Author: Kay Schaller (4918108)
Patrick Bleiziffer (1959418)
Sereina Riniker (1450804)
Publication venue
Publication date
Field of study

Parametrization of small organic molecules for classical molecular dynamics simulations is not trivial. The vastness of the chemical space makes approaches using building blocks challenging. The most common approach is therefore an individual parametrization of each compound by deriving partial charges from semiempirical or ab initio calculations and inheriting the bonded and van der Waals (Lennard-Jones) parameters from a (bio)molecular force field. The quality of the partial charges generated in this fashion depends on the level of the quantum-chemical calculation as well as on the extraction procedure used. Here, we present a machine learning (ML) based approach for predicting partial charges extracted from density functional theory (DFT) electron densities. The training set was chosen with the goal to provide a broad coverage of the known chemical space of druglike molecules. In addition to the speed of the approach, the partial charges predicted by ML are not dependent on the three-dimensional conformation in contrast to the ones obtained by fitting to the electrostatic potential (ESP). To assess the quality and compatibility with standard force fields, we performed benchmark calculations for the free energy of hydration and liquid properties such as density and heat of vaporization

FigShare