10 research outputs found
DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment
We present a robust and computationally efficient approach for assigning
partial charges of atoms in molecules. The method is based on a hierarchical
tree constructed from attention values extracted from a graph neural network
(GNN), which was trained to predict atomic partial charges from accurate
quantum-mechanical (QM) calculations. The resulting dynamic attention-based
substructure hierarchy (DASH) approach provides fast assignment of partial
charges with the same accuracy as the GNN itself, is software-independent, and
can easily be integrated in existing parametrization pipelines as shown for the
Open force field (OpenFF). The implementation of the DASH workflow, the final
DASH tree, and the training set are available as open source / open data from
public repositories
Implicit solvent approach based on generalized Born and transferable graph neural networks for molecular dynamics simulations
Molecular dynamics simulations enable the study of the motion of small and large (bio)molecules and the estimation of their conformational ensembles. The description of the environment (solvent) has, therefore, a large impact. Implicit solvent representations are efficient but, in many cases, not accurate enough (especially for polar solvents, such as water). More accurate but also computationally more expensive is the explicit treatment of the solvent molecules. Recently, machine learning has been proposed to bridge the gap and simulate, in an implicit manner, explicit solvation effects. However, the current approaches rely on prior knowledge of the entire conformational space, limiting their application in practice. Here, we introduce a graph neural network based implicit solvent that is capable of describing explicit solvent effects for peptides with different compositions than those contained in the training set.ISSN:0021-9606ISSN:1089-769
Perplexity-based molecule ranking and bias estimation of chemical language models
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases
Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy" (beam search) with "explorative" (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.ISSN:1549-9596ISSN:0095-2338ISSN:1520-514
Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy"(beam search) with "explorative"(multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases
Influence of the fluorophore mobility on distance measurement by gas-phase FRET
Gas-phase Förster resonance energy transfer (FRET) combines mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecular ions. In FRET, fluorophore pairs are typically covalently attached to a biomolecule using short linkers, which affect the mobility of the dye and the relative orientation of the transition dipole moments of the donor and acceptor. Intramolecular interactions may further influence the range of motion. Yet, little is known about this factor, despite the importance of intramolecular interactions in the absence of a solvent. In this study, we applied transition metal ion FRET (tmFRET) to probe the mobility of a single chromophore pair (Rhodamine 110 and Cu2+) as a function of linker lengths to assess the relevance of intramolecular interactions. Increasing FRET efficiencies were observed with increasing linker length, ranging from 5% (2 atoms) to 28% (13 atoms). To rationalize this trend, we profiled the conformational landscape of each model system using molecular dynamics (MD) simulations. We captured intramolecular interactions that promote a population shift toward smaller donor–acceptor separation for longer linker lengths and induce a significant increase in the acceptor’s transition dipole moment. The presented methodology is a first step toward the explicit consideration of a fluorophore’s range of motion in the interpretation of gas-phase FRET experiments.ISSN:1089-5639ISSN:1520-521
Influence of the fluorophore mobility on distance measurements by gas phase FRET.
Gas-phase Förster resonance energy transfer (FRET) combines the advantages of mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecules. While this implementation of FRET in the gas phase promises detailed insights for fundamental and applied studies, the gas-phase environment also poses great challenges. For FRET, fluorophore pairs are typically covalently attached to strategic binding sites in the backbone of a biomolecule, using short linkers. The linker further increases the mobility of the dye, contributing to rotational averaging of the relative orientation of the transition dipole moments of donor and acceptor. However, little is known about the fluorophore’s degrees of freedom in the gas phase and how it may be influenced by intramolecular interactions. In this study, we test the influence of a fluorophore’s linker length on the measured FRET efficiencies in the gas phase to probe the mobility of the fluorophore. An increased FRET efficiency was observed with increasing linker length, ranging from 5.3 % for a linker consisting of 2 atoms to 27.7 % for a linker length of 13 atoms. To rationalize this trend, we profiled the conformational landscape of each model system with MD simulations. Employing state-of-the-art enhanced sampling techniques, we captured intramolecular interactions that promote a population shift towards smaller donor-acceptor separation for longer linker lengths and induce a significant increase in their acceptor dipole. The presented methodology is a first step towards the explicit consideration of a fluoruophore’s range of motion in the interpretation of gas-phase FRET experiments
Probing the Stability of a β‑Hairpin Scaffold after Desolvation
Probing
the structural characteristics of biomolecular ions in
the gas phase following native mass spectrometry (nMS) is of great
interest, because noncovalent interactions, and thus native fold features,
are believed to be largely retained upon desolvation. However, the
conformation usually depends heavily on the charge state of the species
investigated. In this study, we combine transition metal ion Förster
resonance energy transfer (tmFRET) and ion mobility-mass spectrometry
(IM-MS) with molecular dynamics (MD) simulations to interrogate the
β-hairpin structure of GB1p in vacuo. Fluorescence lifetime
values and collisional cross sections suggest an unfolding of the
β-hairpin motif for higher charge states. MD simulations are
consistent with experimental constraints, yet intriguingly provide
an alternative structural interpretation: preservation of the β-hairpin
is not only predicted for 2+ but also for 4+ charged species, which
is unexpected given the substantial Coulomb repulsion for small secondary
structure scaffolds
DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment
We present a robust and computationally efficient approach
for
assigning partial charges of atoms in molecules. The method is based
on a hierarchical tree constructed from attention values extracted
from a graph neural network (GNN), which was trained to predict atomic
partial charges from accurate quantum-mechanical (QM) calculations.
The resulting dynamic attention-based substructure hierarchy (DASH)
approach provides fast assignment of partial charges with the same
accuracy as the GNN itself, is software-independent, and can easily
be integrated in existing parametrization pipelines, as shown for
the Open force field (OpenFF). The implementation of the DASH workflow,
the final DASH tree, and the training set are available as open source/open
data from public repositories
Computational Predictions of Nonclinical Pharmacokinetics at the Drug Design Stage
Although computational predictions of pharmacokinetics
(PK) are
desirable at the drug design stage, existing approaches are often
limited by prediction accuracy and human interpretability. Using a
discovery data set of mouse and rat PK studies at Roche (9,685 unique
compounds), we performed a proof-of-concept study to predict key PK
properties from chemical structure alone, including plasma clearance
(CLp), volume of distribution at steady-state (Vss), and oral bioavailability
(F). Ten machine learning (ML) models were evaluated, including Single-Task,
Multitask, and transfer learning approaches (i.e., pretraining with in vitro data). In addition to prediction accuracy, we emphasized
human interpretability of outcomes, especially the quantification
of uncertainty, applicability domains, and explanations of predictions
in terms of molecular features. Results show that intravenous (IV)
PK properties (CLp and Vss) can be predicted with good precision (average
absolute fold error, AAFE of 1.96–2.84 depending on data split)
and low bias (average fold error, AFE of 0.98–1.36), with AutoGluon,
Gaussian Process Regressor (GP), and ChemProp displaying the best
performance. Driven by higher complexity of oral PK studies, predictions
of F were more challenging, with the best AAFE values of 2.35–2.60
and higher overprediction bias (AFE of 1.45–1.62). Multi-Task
approaches and pretraining of ChemProp neural networks with in vitro data showed similar precision to Single-Task models
but helped reduce the bias and increase correlations between observations
and predictions. A combination of GP-computed prediction variance,
molecular clustering, and dimensionality-reduction provided valuable
quantitative insights into prediction uncertainty and applicability
domains. SHAPley Additive exPlanations (SHAPs) highlighted molecular
features contributing to prediction outcomes of Vss, providing explanations
that could aid drug design. Combined results show that computational
predictions of PK are feasible at the drug design stage, with several
ML technologies converging to successfully leverage historical PK
data sets. Further studies are needed to unlock the full potential
of this approach, especially with respect to data set sizes and quality,
transfer learning between in vitro and in
vivo data sets, model-independent quantification of uncertainty,
and explainability of predictions