    Quantum–mechanical property prediction of solvated drug molecules: what have we learned from a decade of SAMPL blind prediction challenges?

    Joint academic–industrial projects supporting drug discovery are frequently pursued to deploy and benchmark cutting-edge methodical developments from academia in a real-world industrial environment at different scales. The dimensionality of tasks ranges from small molecule physicochemical property assessment over protein–ligand interaction up to statistical analyses of biological data. This way, method development and usability both benefit from insights gained at both ends, when predictiveness and readiness of novel approaches are confirmed, but the pharmaceutical drug makers get early access to novel tools for the quality of drug products and benefit of patients. Quantum–mechanical and simulation methods particularly fall into this group of methods, as they require skills and expense in their development but also significant resources in their application, thus are comparatively slowly dripping into the realm of industrial use. Nevertheless, these physics-based methods are becoming more and more useful. Starting with a general overview of these and in particular quantum–mechanical methods for drug discovery we review a decade-long and ongoing collaboration between Sanofi and the Kast group focused on the application of the embedded cluster reference interaction site model (EC-RISM), a solvation model for quantum chemistry, to study small molecule chemistry in the context of joint participation in several SAMPL (Statistical Assessment of Modeling of Proteins and Ligands) blind prediction challenges. Starting with early application to tautomer equilibria in water (SAMPL2) the methodology was further developed to allow for challenge contributions related to predictions of distribution coefficients (SAMPL5) and acidity constants (SAMPL6) over the years. Particular emphasis is put on a frequently overlooked aspect of measuring the quality of models, namely the retrospective analysis of earlier datasets and predictions in light of more recent and advanced developments. We therefore demonstrate the performance of the current methodical state of the art as developed and optimized for the SAMPL6 pKa and octanol–water log P challenges when re-applied to the earlier SAMPL5 cyclohexane-water log D and SAMPL2 tautomer equilibria datasets. Systematic improvement is not consistently found throughout despite the similarity of the problem class, i.e. protonation reactions and phase distribution. Hence, it is possible to learn about hidden bias in model assessment, as results derived from more elaborate methods do not necessarily improve quantitative agreement. This indicates the role of chance or coincidence for model development on the one hand which allows for the identification of systematic error and opportunities toward improvement and reveals possible sources of experimental uncertainty on the other. These insights are particularly useful for further academia–industry collaborations, as both partners are then enabled to optimize both the computational and experimental settings for data generation

    Physicochemical property prediction for small molecules using integral equation-based solvation models

    In this thesis the accurate prediction of physicochemical properties of small, pharmaceutically relevant compounds is investigated. To predict condensed phase properties such as hydration free energies, acid dissociation constants (pKa), and distribution and partition coefficients (log D and log P, respectively) it is necessary to accurately describe the solute, the solute-solvent interactions, and the solvent-response to the solute’s presence. When this is achieved, the Gibbs energies of the molecules in solution can be used to calculate macroscopic physicochemical properties. The embedded cluster reference interaction site model (EC-RISM) makes it possible to combine a quantum chemical description of the solute with an accurate solvent response via the three-dimensional reference interaction site model (3D RISM). This is ideal for calculating physicochemical properties of small molecules, because EC RISM yields both the electronic energy of the solvent-polarized wave function, as well as the excess chemical potential of the molecule in solution, the sum of which can be defined as the Gibbs energy of the molecule in solution. The development of solvent susceptibilities for the non-aqueous solvents cyclohexane and n octanol is reported, as well as the challenges and implications of including water saturation for organic solvents. The solvent susceptibilities are used to train partial molar volume corrections to correct for the error inherent in the calculation of the 3D RISM excess chemical potential using reference data from the Minnesota solvation database (MNSOL). Additionally, a method to calculate accurate pKa values is presented and the formal equivalence of a microstate transition and a partition function approach is briefly summarized. The performance of the models is benchmarked by participation in the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges. First, the SAMPL5 challenge, where cyclohexane-water distribution coefficients log D7.4 had to be calculated. In subsequent challenges the task was split into determining aqueous pKa values during the SAMPL6 challenge and octanol-water partition coefficients log P of a subset of these compounds for SAMPL6 part II. Over the course of these challenges a number of key improvements were made to the EC RISM model, often directly as a result of inconsistencies or performance issues during one of the SAMPL challenges. Finally, an extension of the partial molar volume correction to extreme conditions such as high pressure is reported

    Evaluation of log P, pKa, and log D predictions from the SAMPL7 blind challenge

    The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods

    A combined computational and NMR-spectroscopic approach for tautomer elucidation under extreme conditions towards investigating the robustness of genetic codes

    The goal of this work was to establish a combined computational and experimental workflow for the prediction of tautomeric ratios of small molecules in solution under various environmental conditions. Quantum chemical (QC) calculations using the embedded cluster reference interaction site model (EC-RISM), which takes into account the solvent structure and the mutual polarization of solute and solvent and is able to incorporate environmental effects via appropriate correction terms, form the computational part of this workflow, NMR experiments the experimental part. Benchmarking of EC-RISM for the prediction of tautomeric ratios was performed using the SAMPL2 dataset and histamine, for which the workflow was extensively tested at ambient conditions and used to identify the nuclei most sensitive to tautomerism. This system was also used to develop an EC-RISM based force field (FF) reparametrization workflow. A temperature-dependent correction term for EC-RISM was developed, benchmarked, and used in conjunction with a pressure-dependent correction term to calculate NMR chemical shifts. Various computational NMR referencing methods were developed using reference shielding constants of trimethylsilylpropanesulfonate (DSS) and ammonia and their performance was tested on N-methyl-acetamide (NMA) and trimethylamine-N-oxide (TMAO). The tautomeric ratios of nucleobases were calculated at different pressures and temperatures for the natural species and the hachimoji expanded genetic alphabet. Initial steps were also taken towards the prediction of the tautomeric ratios of larger nucleic acid building blocks such as nucleotides

    Computational modelling of solvent effects

    This thesis is concerned with developing theoretical benchmarks and computational procedures that would facilitate robust descriptions of solvent effects on molecular properties and chemical reactions. This advancement will enable chemists to design more effective chemical reagents, drug molecules and materials, thereby reducing the need for extensive experimental trial-and-error. Towards this end, this thesis has developed theoretical benchmarks to evaluate the performance of lower-cost and approximate methods in predicting solute-solvent interaction energies. This includes the generation of high-level calculations of solute-solvent interactions and proton transfer reaction energies in very large water clusters (up to 160 water molecules) at a variety of solute-solvent configurations. This differs from previous studies, which mostly focused on small solvated clusters (1-6 solvent molecules) at equilibrium geometries. These theoretical benchmarks were then used to assess the performance of a range of contemporary density functional theory methods and hybrid quantum mechanics/molecular mechanics (QM/MM) approximations of these methods. A surprising finding was that significantly larger than expected QM region size (solute plus 40 or more water molecules) was needed before the QM/MM models converged to within 5.7 kJ mol-1 of the direct QM result. To address this limitation, an important contribution of this thesis is the development of efficient strategies based on charge-shift analysis and electrostatically embedded fragment methods to accelerate the convergence of the QM/MM models with respect to QM region size. Of particular note, the QM region selection based on atomic charges significantly reduced the errors in QM/MM models even when a low-level embedding potential was used. Finally, these findings culminated in developing a dual-Hamiltonian approach that may be used to systematically improve the accuracy of force field explicit solvent simulations of barriers of organic reactions. It is envisaged that these developments will directly contribute to the development of a systematic framework for improving computational simulations of solution-phase processes

    Applications of integral equation theory to biological systems

    Das „three-dimensional reference interaction site model“ (3D RISM) erlaubt es die Solvensverteilung, und somit die damit verbundenen thermodynamischen Eigenschaften, um ein gegebenes Solvat zu berechnen. Dies kann ein kleines, wirkstoffartiges MolekĂŒl sein oder ein Protein mit tausenden Atomen. Zusammen mit Methoden, wie Molekulardynamik- (MD) Simulationen und Kraftfeldern, ist es möglich, die Unterschiede in der freien Energie zwischen Konformeren, MolekĂŒlen und Komplexen in biologisch relevanten Systemen zu bestimmen. In dieser Arbeit werden durch Kombination von 3D RISM und MD Simulationen freie Energiedifferenzen zwischen zwei Konformeren eines Antikörpers berechnet und durch Tests mit verschiedenen Wassermodellen und Fehlerkorrekturen validiert. Allerdings entstehen durch starke strukturelle Fluktuationen wĂ€hrend der Simulationen hĂ€ufig große statistische Fehler, was die Anwendungsfelder solcher Methoden limitiert. Um das Problem abzuschwĂ€chen und um auf explizite Simulationen verzichten zu können, werden sogenannte „Localized Free Energies” (LFE) verwendet. Mit ihnen ist es möglich, die freie Energie auf ein atomweises Niveau herunter zu brechen, wo angenommen werden kann, dass besagte Fluktuationen einen geringeren Einfluss haben. Da eine solche Partitionierung rein virtuell ist, gibt es keinen experimentellen Weg, die LFEs zu validieren. Aus diesem Grund wird ihre PlausibilitĂ€t durch Anwendung als Eingabeinformation fĂŒr Methoden des maschinellen Lernens (ML) ĂŒberprĂŒft, indem der Verlust ihrer Vorhersagekraft durch ansteigende Störung der LFEs beobachtet wird. Mit bestĂ€tigter PlausibilitĂ€t werden die LFEs beispielhaft auf eine Serie von Thrombin-Inhibitoren angwendet, um ihr Potential in der Medikamentenentwicklung zu zeigen. DarĂŒberhinaus wird der Einfluss von experimentellen Unsicherheiten in den Kristallstrukturen sowie die Limitationen des Ansatzes selbst untersucht. Von der gleichen formalen Basis, wie sie auch bei den LFEs genutzt wurde, lassen sich auch die so genannten „Free Energy Derivatives” (FED) sehr effizient bestimmen. Diese beschreiben auf atomarer Ebene, wie sich die freie Energie in AbhĂ€ngigkeit von Kraftfeldparametern verĂ€ndert. Die LFEs werden ebenfalls anhand eines Thrombin Komplexes nĂ€her beleuchtet und ihr prĂ€diktiver Einsatz wird anhand eines auf Literaturdaten basierenden in-silico Experiments demonstriert. The three-dimensional reference interaction site model (3D RISM) allows to compute the solvent distribution, and therefore the associated thermodynamic properties, around a given solute. This can be a small, drug-like molecule or a protein with several thousand atoms. Combined with other tools like molecular dynamics (MD) simulations and force fields, it is possible to study the differences in free energy of conformations, molecules, and complexes in biological relevant systems. By combining 3D RISM with MD simulations, the free energy difference between two structural conformers of an antibody is calculated, and the results are verified by tests with different water models and error corrections. However, due to strong structural fluctuation during the simulations, the statistical errors are often high, which limits the field of applications of such studies. To alleviate this problem and to be able to do without explicit simulations, so so-called localized free energies (LFE) are employed. With them it is possible to break down free energies to an atom-wise level, where said fluctuations can be assumed to have less influence on the results. Since such a partitioning is purely virtual, there is no experimental way to validate the LFEs. For this reason, their plausibility is checked by using them as input for machine learning (ML) models, analyzing the drop in predictive power upon increasing levels of perturbation in the LFE input. With the plausibility of the method established, the LFEs are applied to an exemplary series of thrombin inhibitors to illustrate their potential in a drug discovery context. Here they are used to identify the most relevant interactions between host and guest. Furthermore, the influence of experimental uncertainties in crystal structures and the limitations of the approach get explored. Coming from the same formal basis as it was used for the LFEs, it is possible to calculate so-called free energy derivatives (FED) very efficiently. They describe how the free energy changes with respect to the non-bonded force field parameters on an atomistic level. The FEDs are also applied to thrombin complex, exploring the capabilities of the approach and investigating the predictive applicability of the FEDs by performing an in-silico experiment on literature data

    New molecular simulation methods for quantitative modelling of protein-ligand interactions

    The main theme of this work is the design and development of new molecular simulation protocols, to achieve more accurate and reliable estimates of free energy changes for processes relevant to the structure-based drug design. The works starts with an insight into the reproducibility problem for alchemical free energy calculations. Even if simulations are run with similar input files, the use of different simulation engines could give different free energy results. As part of a collaborative effort, the implementation details of AMBER, GROMACS, SOMD and CHARMM simulation codes were studied and free energy protocols for each software were validated to converge towards a reproducibility limit of about 0.20 kcal.mol-1 for hydration free energies of small organic molecules. Following, new simulation methods for the estimation of lipophilicity coefficients (log P and log D) for drug like molecules were developed and validated. log P values were computed for a dataset of 5 molecules with increasing fluorination level. Predictions were in line with the experimental measures and the simulations also allowed new insights into the water-solute interactions that drive the partitioning process. Then, as part of the SAMPL5 challenge, log D values for 53 drug-like molecules were computed. In this context two different simulation models were derived in order to take into account the presence of protonated species. The results were encouraging but also highlighted limits in alchemical free energy modelling. As an additional task of the SAMPL5 contest, three different protocols were validated for predicting absolute binding affinities for 22 host-guest systems. The first model yielded a free energy of binding based on free energy changes in solvated and complex phase; the second added the long range dispersion correction to the previous model; the third one used a standard state correction term. All three protocols were among the top-ranked submission in SAMPL5, with a correlation coefficient R2 of about 0.7 against experimental data. Finally, the origins and magnitude of the finite size artefacts in alchemical free energy calculations were investigated. Finite size artefacts are especially predominant in calculations that involve changes in the net-charge of a solute. A new correction scheme was devised for the Barker Watts Reaction Field approach and compared with the literature. Hydration free energy calculations on simple ionic species were carried out to validate the consistency of the scheme and the approach was further extended to host-guest binding affinities predictions

    Nanomedicine Formulations Based on PLGA Nanoparticles for Diagnosis, Monitoring and Treatment of Disease: From Bench to Bedside

    Nanomedicine is among the most promising emerging fields that can provide innovative and radical solutions to unmet needs in pharmaceutical formulation development. Encapsulation of active pharmaceutical ingredients within nano-size carriers offers several benefits, namely, protection of the therapeutic agents from degradation, their increased solubility and bioavailability, improved pharmacokinetics, reduced toxicity, enhanced therapeutic efficacy, decreased drug immunogenicity, targeted delivery, and simultaneous imaging and treatment options with a single system.Poly(lactide-co-glycolide) (PLGA) is one of the most commonly used polymers in nanomedicine formulations due to its excellent biocompatibility, tunable degradation characteristics, and high versatility. Furthermore, PLGA is approved by the European Medicines Agency (EMA) and the Food and Drug Administration (FDA) for use in pharmaceutical products. Nanomedicines based on PLGA nanoparticles can offer tremendous opportunities in the diagnosis, monitoring, and treatment of various diseases.This Special Issue aims to focus on the bench-to-bedside development of PLGA nanoparticles including (but not limited to) design, development, physicochemical characterization, scale-up production, efficacy and safety assessment, and biodistribution studies of these nanomedicine formulations