7 research outputs found

    Physicochemical property prediction for small molecules using integral equation-based solvation models

    Get PDF
    In this thesis the accurate prediction of physicochemical properties of small, pharmaceutically relevant compounds is investigated. To predict condensed phase properties such as hydration free energies, acid dissociation constants (pKa), and distribution and partition coefficients (log D and log P, respectively) it is necessary to accurately describe the solute, the solute-solvent interactions, and the solvent-response to the solute’s presence. When this is achieved, the Gibbs energies of the molecules in solution can be used to calculate macroscopic physicochemical properties. The embedded cluster reference interaction site model (EC-RISM) makes it possible to combine a quantum chemical description of the solute with an accurate solvent response via the three-dimensional reference interaction site model (3D RISM). This is ideal for calculating physicochemical properties of small molecules, because EC RISM yields both the electronic energy of the solvent-polarized wave function, as well as the excess chemical potential of the molecule in solution, the sum of which can be defined as the Gibbs energy of the molecule in solution. The development of solvent susceptibilities for the non-aqueous solvents cyclohexane and n octanol is reported, as well as the challenges and implications of including water saturation for organic solvents. The solvent susceptibilities are used to train partial molar volume corrections to correct for the error inherent in the calculation of the 3D RISM excess chemical potential using reference data from the Minnesota solvation database (MNSOL). Additionally, a method to calculate accurate pKa values is presented and the formal equivalence of a microstate transition and a partition function approach is briefly summarized. The performance of the models is benchmarked by participation in the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges. First, the SAMPL5 challenge, where cyclohexane-water distribution coefficients log D7.4 had to be calculated. In subsequent challenges the task was split into determining aqueous pKa values during the SAMPL6 challenge and octanol-water partition coefficients log P of a subset of these compounds for SAMPL6 part II. Over the course of these challenges a number of key improvements were made to the EC RISM model, often directly as a result of inconsistencies or performance issues during one of the SAMPL challenges. Finally, an extension of the partial molar volume correction to extreme conditions such as high pressure is reported

    Quantum–mechanical property prediction of solvated drug molecules: what have we learned from a decade of SAMPL blind prediction challenges?

    Get PDF
    Joint academic–industrial projects supporting drug discovery are frequently pursued to deploy and benchmark cutting-edge methodical developments from academia in a real-world industrial environment at different scales. The dimensionality of tasks ranges from small molecule physicochemical property assessment over protein–ligand interaction up to statistical analyses of biological data. This way, method development and usability both benefit from insights gained at both ends, when predictiveness and readiness of novel approaches are confirmed, but the pharmaceutical drug makers get early access to novel tools for the quality of drug products and benefit of patients. Quantum–mechanical and simulation methods particularly fall into this group of methods, as they require skills and expense in their development but also significant resources in their application, thus are comparatively slowly dripping into the realm of industrial use. Nevertheless, these physics-based methods are becoming more and more useful. Starting with a general overview of these and in particular quantum–mechanical methods for drug discovery we review a decade-long and ongoing collaboration between Sanofi and the Kast group focused on the application of the embedded cluster reference interaction site model (EC-RISM), a solvation model for quantum chemistry, to study small molecule chemistry in the context of joint participation in several SAMPL (Statistical Assessment of Modeling of Proteins and Ligands) blind prediction challenges. Starting with early application to tautomer equilibria in water (SAMPL2) the methodology was further developed to allow for challenge contributions related to predictions of distribution coefficients (SAMPL5) and acidity constants (SAMPL6) over the years. Particular emphasis is put on a frequently overlooked aspect of measuring the quality of models, namely the retrospective analysis of earlier datasets and predictions in light of more recent and advanced developments. We therefore demonstrate the performance of the current methodical state of the art as developed and optimized for the SAMPL6 pKa and octanol–water log P challenges when re-applied to the earlier SAMPL5 cyclohexane-water log D and SAMPL2 tautomer equilibria datasets. Systematic improvement is not consistently found throughout despite the similarity of the problem class, i.e. protonation reactions and phase distribution. Hence, it is possible to learn about hidden bias in model assessment, as results derived from more elaborate methods do not necessarily improve quantitative agreement. This indicates the role of chance or coincidence for model development on the one hand which allows for the identification of systematic error and opportunities toward improvement and reveals possible sources of experimental uncertainty on the other. These insights are particularly useful for further academia–industry collaborations, as both partners are then enabled to optimize both the computational and experimental settings for data generation

    Evaluation of log P, pKa, and log D predictions from the SAMPL7 blind challenge

    Full text link
    The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods

    A combined computational and NMR-spectroscopic approach for tautomer elucidation under extreme conditions towards investigating the robustness of genetic codes

    Get PDF
    The goal of this work was to establish a combined computational and experimental workflow for the prediction of tautomeric ratios of small molecules in solution under various environmental conditions. Quantum chemical (QC) calculations using the embedded cluster reference interaction site model (EC-RISM), which takes into account the solvent structure and the mutual polarization of solute and solvent and is able to incorporate environmental effects via appropriate correction terms, form the computational part of this workflow, NMR experiments the experimental part. Benchmarking of EC-RISM for the prediction of tautomeric ratios was performed using the SAMPL2 dataset and histamine, for which the workflow was extensively tested at ambient conditions and used to identify the nuclei most sensitive to tautomerism. This system was also used to develop an EC-RISM based force field (FF) reparametrization workflow. A temperature-dependent correction term for EC-RISM was developed, benchmarked, and used in conjunction with a pressure-dependent correction term to calculate NMR chemical shifts. Various computational NMR referencing methods were developed using reference shielding constants of trimethylsilylpropanesulfonate (DSS) and ammonia and their performance was tested on N-methyl-acetamide (NMA) and trimethylamine-N-oxide (TMAO). The tautomeric ratios of nucleobases were calculated at different pressures and temperatures for the natural species and the hachimoji expanded genetic alphabet. Initial steps were also taken towards the prediction of the tautomeric ratios of larger nucleic acid building blocks such as nucleotides

    Applications of integral equation theory to biological systems

    Get PDF
    Das „three-dimensional reference interaction site model“ (3D RISM) erlaubt es die Solvensverteilung, und somit die damit verbundenen thermodynamischen Eigenschaften, um ein gegebenes Solvat zu berechnen. Dies kann ein kleines, wirkstoffartiges MolekĂŒl sein oder ein Protein mit tausenden Atomen. Zusammen mit Methoden, wie Molekulardynamik- (MD) Simulationen und Kraftfeldern, ist es möglich, die Unterschiede in der freien Energie zwischen Konformeren, MolekĂŒlen und Komplexen in biologisch relevanten Systemen zu bestimmen. In dieser Arbeit werden durch Kombination von 3D RISM und MD Simulationen freie Energiedifferenzen zwischen zwei Konformeren eines Antikörpers berechnet und durch Tests mit verschiedenen Wassermodellen und Fehlerkorrekturen validiert. Allerdings entstehen durch starke strukturelle Fluktuationen wĂ€hrend der Simulationen hĂ€ufig große statistische Fehler, was die Anwendungsfelder solcher Methoden limitiert. Um das Problem abzuschwĂ€chen und um auf explizite Simulationen verzichten zu können, werden sogenannte „Localized Free Energies” (LFE) verwendet. Mit ihnen ist es möglich, die freie Energie auf ein atomweises Niveau herunter zu brechen, wo angenommen werden kann, dass besagte Fluktuationen einen geringeren Einfluss haben. Da eine solche Partitionierung rein virtuell ist, gibt es keinen experimentellen Weg, die LFEs zu validieren. Aus diesem Grund wird ihre PlausibilitĂ€t durch Anwendung als Eingabeinformation fĂŒr Methoden des maschinellen Lernens (ML) ĂŒberprĂŒft, indem der Verlust ihrer Vorhersagekraft durch ansteigende Störung der LFEs beobachtet wird. Mit bestĂ€tigter PlausibilitĂ€t werden die LFEs beispielhaft auf eine Serie von Thrombin-Inhibitoren angwendet, um ihr Potential in der Medikamentenentwicklung zu zeigen. DarĂŒberhinaus wird der Einfluss von experimentellen Unsicherheiten in den Kristallstrukturen sowie die Limitationen des Ansatzes selbst untersucht. Von der gleichen formalen Basis, wie sie auch bei den LFEs genutzt wurde, lassen sich auch die so genannten „Free Energy Derivatives” (FED) sehr effizient bestimmen. Diese beschreiben auf atomarer Ebene, wie sich die freie Energie in AbhĂ€ngigkeit von Kraftfeldparametern verĂ€ndert. Die LFEs werden ebenfalls anhand eines Thrombin Komplexes nĂ€her beleuchtet und ihr prĂ€diktiver Einsatz wird anhand eines auf Literaturdaten basierenden in-silico Experiments demonstriert. The three-dimensional reference interaction site model (3D RISM) allows to compute the solvent distribution, and therefore the associated thermodynamic properties, around a given solute. This can be a small, drug-like molecule or a protein with several thousand atoms. Combined with other tools like molecular dynamics (MD) simulations and force fields, it is possible to study the differences in free energy of conformations, molecules, and complexes in biological relevant systems. By combining 3D RISM with MD simulations, the free energy difference between two structural conformers of an antibody is calculated, and the results are verified by tests with different water models and error corrections. However, due to strong structural fluctuation during the simulations, the statistical errors are often high, which limits the field of applications of such studies. To alleviate this problem and to be able to do without explicit simulations, so so-called localized free energies (LFE) are employed. With them it is possible to break down free energies to an atom-wise level, where said fluctuations can be assumed to have less influence on the results. Since such a partitioning is purely virtual, there is no experimental way to validate the LFEs. For this reason, their plausibility is checked by using them as input for machine learning (ML) models, analyzing the drop in predictive power upon increasing levels of perturbation in the LFE input. With the plausibility of the method established, the LFEs are applied to an exemplary series of thrombin inhibitors to illustrate their potential in a drug discovery context. Here they are used to identify the most relevant interactions between host and guest. Furthermore, the influence of experimental uncertainties in crystal structures and the limitations of the approach get explored. Coming from the same formal basis as it was used for the LFEs, it is possible to calculate so-called free energy derivatives (FED) very efficiently. They describe how the free energy changes with respect to the non-bonded force field parameters on an atomistic level. The FEDs are also applied to thrombin complex, exploring the capabilities of the approach and investigating the predictive applicability of the FEDs by performing an in-silico experiment on literature data

    Rism-based pressure-dependent computational spectroscopy

    Get PDF
    Spectroscopic measurements are an indispensable tool in chemical analysis; even under extreme conditions such as high hydrostatic pressures, they can provide valuable insights. Theoretical methods that can reliably reproduce observables in solution can be used to validate the obtained results. A common theoretical model is the Reference Interaction Site Model (RISM), which was used in this work. In the first part, a previously developed method for calculating IR frequencies with the embedded cluster(EC)-RISM under equilibrium conditions was extended to non-equilibrium thermodynamics for IR spectroscopy. The pressure-dependent IR frequency shifts of TMAO and the cyanide anion were investigated as model systems. Furthermore, EC-RISM was used here for the first time to calculate EPR observables at ambient conditions. First, experiments with the geometrically optimized structure showed that EC-RISM gives significantly better results than a standard continuum calculation despite a large deviation from the experiment. A significant improvement in the direction of the experimental values was achieved by using a large number of snapshots from an ab initio molecular dynamics simulation (AIMD) instead of a single geometry. In general, in the context of the theoretical description of high-pressure effects on proteins, the critical question can be raised whether using force fields parameterized for ambient conditions is appropriate for high-pressure conditions. To answer this question, the pressure dependence of the peptide backbone was investigated in the third part, and the small molecules N-methyl acetamide (NMA) and Ac-Gly/Ala-NHMe were used as model systems. In this work, it was shown that EC-RISM is a suitable method of choice for the calculation of spectroscopic observables in solution. Especially when non-ambient conditions are to be examined, EC-RISM shows its strength since it is relatively easily extensible, e.g., high-pressure environments

    Integral equation-based calculations of the electronic structure of small molecules under high pressure

    Get PDF
    This thesis has the aim, to model high hydrostatic pressure environments in such a way, that a physically based response of the electronic structure of small molecules can be obtained via quantum chemistry (QC) calculations. The embedded cluster reference interaction site model (EC-RISM) was extended to the high-pressure regime. Therefore, one dimensional reference interaction site model (1D RISM) calculations were performed in order to obtain high pressure solvent susceptibilities for the three dimensional reference interaction site model (3D RISM) within EC-RISM. This process resulted in two different types of solvent susceptibilities: On the one hand, the hyper-netted chain (HNC) approximation was utilized, on the other hand co-operational work was performed in order to obtain solvent susceptibilities from molecular dynamics simulations. This first step enables the EC-RISM method to gain insight into the electronic structure of small molecules under high pressure. In a highly co-operational project within the DFG research unit FOR1979 EC-RISM data were used to parametrize the atomic charges of trimethylamine-N-oxide (TMAO) in order to obtain accurate observables under high pressure conditions in comparison with results from ab initio molecular dynamics simulations (aiMD). In order to compare EC-RISM results with experimental data, a methodology for calculating high pressure band shifts in infra-red spectra was developed and applied to TMAO. As an additional idea for future high pressure force field adaptations the pressure dependence of the dipole moment of urea as a function of Lennard-Jones parameters was calculated. This approach can be extended to different observables. Furthermore, force field parameters for hydronium and hydroxide were optimized using a differential evolutionary approach in order to calculate accurate thermodynamic data on the autoprotolysis equilibrium of water under high pressure conditions. This equilibrium is experimentally well-characterized and therefore an ideal benchmark case to prove the accuracy of EC-RISM for high pressure thermodynamics. As a first test of a novel semi-empiric Hamilton operator in EC-RISM calculations, the polarization effect of pressure dependent aqueous solvation of various small molecules was tested in comparison with ab initio quantum chemistry
    corecore