9 research outputs found

    DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment

    Full text link
    We present a robust and computationally efficient approach for assigning partial charges of atoms in molecules. The method is based on a hierarchical tree constructed from attention values extracted from a graph neural network (GNN), which was trained to predict atomic partial charges from accurate quantum-mechanical (QM) calculations. The resulting dynamic attention-based substructure hierarchy (DASH) approach provides fast assignment of partial charges with the same accuracy as the GNN itself, is software-independent, and can easily be integrated in existing parametrization pipelines as shown for the Open force field (OpenFF). The implementation of the DASH workflow, the final DASH tree, and the training set are available as open source / open data from public repositories

    Implicit solvent approach based on generalized Born and transferable graph neural networks for molecular dynamics simulations

    No full text
    Molecular dynamics simulations enable the study of the motion of small and large (bio)molecules and the estimation of their conformational ensembles. The description of the environment (solvent) has, therefore, a large impact. Implicit solvent representations are efficient but, in many cases, not accurate enough (especially for polar solvents, such as water). More accurate but also computationally more expensive is the explicit treatment of the solvent molecules. Recently, machine learning has been proposed to bridge the gap and simulate, in an implicit manner, explicit solvation effects. However, the current approaches rely on prior knowledge of the entire conformational space, limiting their application in practice. Here, we introduce a graph neural network based implicit solvent that is capable of describing explicit solvent effects for peptides with different compositions than those contained in the training set.ISSN:0021-9606ISSN:1089-769

    Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

    No full text
    Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy" (beam search) with "explorative" (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.ISSN:1549-9596ISSN:0095-2338ISSN:1520-514

    Perplexity-based molecule ranking and bias estimation of chemical language models

    No full text
    Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases

    Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

    Get PDF
    Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy"(beam search) with "explorative"(multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases

    Influence of the fluorophore mobility on distance measurement by gas-phase FRET

    No full text
    Gas-phase Förster resonance energy transfer (FRET) combines mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecular ions. In FRET, fluorophore pairs are typically covalently attached to a biomolecule using short linkers, which affect the mobility of the dye and the relative orientation of the transition dipole moments of the donor and acceptor. Intramolecular interactions may further influence the range of motion. Yet, little is known about this factor, despite the importance of intramolecular interactions in the absence of a solvent. In this study, we applied transition metal ion FRET (tmFRET) to probe the mobility of a single chromophore pair (Rhodamine 110 and Cu2+) as a function of linker lengths to assess the relevance of intramolecular interactions. Increasing FRET efficiencies were observed with increasing linker length, ranging from 5% (2 atoms) to 28% (13 atoms). To rationalize this trend, we profiled the conformational landscape of each model system using molecular dynamics (MD) simulations. We captured intramolecular interactions that promote a population shift toward smaller donor–acceptor separation for longer linker lengths and induce a significant increase in the acceptor’s transition dipole moment. The presented methodology is a first step toward the explicit consideration of a fluorophore’s range of motion in the interpretation of gas-phase FRET experiments.ISSN:1089-5639ISSN:1520-521

    Influence of the fluorophore mobility on distance measurements by gas phase FRET.

    No full text
    Gas-phase Förster resonance energy transfer (FRET) combines the advantages of mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecules. While this implementation of FRET in the gas phase promises detailed insights for fundamental and applied studies, the gas-phase environment also poses great challenges. For FRET, fluorophore pairs are typically covalently attached to strategic binding sites in the backbone of a biomolecule, using short linkers. The linker further increases the mobility of the dye, contributing to rotational averaging of the relative orientation of the transition dipole moments of donor and acceptor. However, little is known about the fluorophore’s degrees of freedom in the gas phase and how it may be influenced by intramolecular interactions. In this study, we test the influence of a fluorophore’s linker length on the measured FRET efficiencies in the gas phase to probe the mobility of the fluorophore. An increased FRET efficiency was observed with increasing linker length, ranging from 5.3 % for a linker consisting of 2 atoms to 27.7 % for a linker length of 13 atoms. To rationalize this trend, we profiled the conformational landscape of each model system with MD simulations. Employing state-of-the-art enhanced sampling techniques, we captured intramolecular interactions that promote a population shift towards smaller donor-acceptor separation for longer linker lengths and induce a significant increase in their acceptor dipole. The presented methodology is a first step towards the explicit consideration of a fluoruophore’s range of motion in the interpretation of gas-phase FRET experiments

    DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment

    No full text
    We present a robust and computationally efficient approach for assigning partial charges of atoms in molecules. The method is based on a hierarchical tree constructed from attention values extracted from a graph neural network (GNN), which was trained to predict atomic partial charges from accurate quantum-mechanical (QM) calculations. The resulting dynamic attention-based substructure hierarchy (DASH) approach provides fast assignment of partial charges with the same accuracy as the GNN itself, is software-independent, and can easily be integrated in existing parametrization pipelines, as shown for the Open force field (OpenFF). The implementation of the DASH workflow, the final DASH tree, and the training set are available as open source/open data from public repositories

    Computational Predictions of Nonclinical Pharmacokinetics at the Drug Design Stage

    No full text
    Although computational predictions of pharmacokinetics (PK) are desirable at the drug design stage, existing approaches are often limited by prediction accuracy and human interpretability. Using a discovery data set of mouse and rat PK studies at Roche (9,685 unique compounds), we performed a proof-of-concept study to predict key PK properties from chemical structure alone, including plasma clearance (CLp), volume of distribution at steady-state (Vss), and oral bioavailability (F). Ten machine learning (ML) models were evaluated, including Single-Task, Multitask, and transfer learning approaches (i.e., pretraining with in vitro data). In addition to prediction accuracy, we emphasized human interpretability of outcomes, especially the quantification of uncertainty, applicability domains, and explanations of predictions in terms of molecular features. Results show that intravenous (IV) PK properties (CLp and Vss) can be predicted with good precision (average absolute fold error, AAFE of 1.96–2.84 depending on data split) and low bias (average fold error, AFE of 0.98–1.36), with AutoGluon, Gaussian Process Regressor (GP), and ChemProp displaying the best performance. Driven by higher complexity of oral PK studies, predictions of F were more challenging, with the best AAFE values of 2.35–2.60 and higher overprediction bias (AFE of 1.45–1.62). Multi-Task approaches and pretraining of ChemProp neural networks with in vitro data showed similar precision to Single-Task models but helped reduce the bias and increase correlations between observations and predictions. A combination of GP-computed prediction variance, molecular clustering, and dimensionality-reduction provided valuable quantitative insights into prediction uncertainty and applicability domains. SHAPley Additive exPlanations (SHAPs) highlighted molecular features contributing to prediction outcomes of Vss, providing explanations that could aid drug design. Combined results show that computational predictions of PK are feasible at the drug design stage, with several ML technologies converging to successfully leverage historical PK data sets. Further studies are needed to unlock the full potential of this approach, especially with respect to data set sizes and quality, transfer learning between in vitro and in vivo data sets, model-independent quantification of uncertainty, and explainability of predictions
    corecore