22 research outputs found

    Development and application of novel computational tools for structure based drug design

    No full text
    Computational structure-based methods represent valuable tools in the drug design pipeline, as evidenced by their widespread use. This thesis describes five research projects that represent important computational advances in the methodologies and protocols of lead discovery and optimization. The first project demonstrates that the current paradigm of utilizing a fixed 1.4 Å solvent probe radius when generating the molecular surface results in unrealistic hydrophobic cavities and pockets on the surface. A novel method is developed which allows the solvent probe to change size according to its contacting atoms on the surface, thereby producing a more accurate representation of protein clefts, pockets and cavities, as well as giving a better tuned description of shape complementarity and electrostatic energies. The second project demonstrates that utilizing hydration parameters for calculation of electrostatic binding free energies results in a large mean error between predicted and experimental binding affinities. Consequently, a novel solvated interaction energy (SIE) scoring function is developed that parameterizes the electrostatic model using experimental binding data. The SIE scoring function is able to reproduce the affinities of 99 varied ligand-protein complexes with a mean error of 1.29 kcal/mol. In the third project, the SIE scoring function is incorporated into a novel virtual screening (VS) pipeline. The discriminative power of the SIE function is improved by including terms that account for entropy and hydrogen bonding. The resulting VS pipeline outperforms the majority of other VS methodologies in the literature. In the fourth project, a modified version of the VS pipeline is employed to predict the binding mode of a glycopeptide antibiotic to the bacterial sulfotransferase StaL. The predicted binding mode is in good agreement with experimental findings. In the fifth project, a novel method to compute electrostatic optimal charge selectivity, termedLes méthodes de modélisation basées sur la structure tridimensionnelle des protéines sont d'une grande utilité dans le développement de médicaments. Cette thèse résume cinq projets de recherche proposant des avancées dans la méthodologie et les protocoles liés à la découverte et à l'optimisation de petites molécules actives. Le premier projet démontre que le paradigme voulant que la surface moléculaire exposée au solvant soit décrite par le contact d'une sonde sphérique de rayon constant de 1.4Å avec la protéine fait apparaître des poches et des cavités hydrophobes irréalistes sur la surface. Une méthode novatrice est présentée où le rayon de la sonde change au contact des différents types d'atome produisant ainsi une représentation plus réaliste des aspérités de la surface ayant pour effet une meilleure définition de la complémentarité de la forme moléculaire ainsi qu'une meilleure estimation de l'énergie électrostatique. Le deuxième projet démontre que l'utilisation d'un simple coefficient d'hydratation pour corriger le calcul de l'énergie libre de liaison entraîne une erreur moyenne importante entre les affinités calculées et mesurées expérimentalement. Conséquemment, une nouvelle fonction d'énergie tenant compte de la solvatation (SIE) a été développée appuyant sa paramétrisation sur des données expérimentales. La fonction SIE est en mesure de reproduire les affinités de 99 complexes ligand-protéine différents avec une erreur moyenne de 1.29 kcal/mol. Dans le troisième projet, la fonction SIE est introduite dans une nouvelle procédure de criblage virtuel (CV). La capacité de la fonction SIE est accrue par l'ajout d'un terme d'entropie et d'un terme décrivant les liaisons hydrogène. La procédure de criblage qui en résulte déclasse la majorité des autres méthodes publiées. Dans le quatrième projet une version modifiée de la procédure de criblage est utilisée pour prédire les modes d

    Molecular surface generation using a variable-radius solvent probe

    No full text
    Protein-ligand binding occurs through interactions at the molecular surface. Hence, a proper description of this surface is essential to our understanding of the process of molecular recognition. Recent studies have noted the inadequacy of using a fixed 1.4 \ufffd solvent probe radius to generate the molecular surface. This assumes that water molecules approach all surface atoms at an equal distance irrespective of polarity, which is not the case. To adequately model the protein-water boundary requires that the solvent probe radius change according to the polarity of its contacting atoms, smaller near polar atoms and larger near apolar atoms. To our knowledge, no method currently exists to generate the molecular surface of a protein in this manner. Using a modification of the marching tetrahedra algorithm, we present a method to generate molecular surfaces using a variable radius solvent probe. The resulting surface lacks many of the unrealistic small crevices in nonpolar regions that are found when utilizing an invariant 1.4 \ufffd solvent probe, while maintaining the fine detail of the surface at polar regions. On application of the method on a test set of 20 protein structures taken from the Protein Data Bank (PDB), we also find far fewer empty unsolvated cavities that are present when using only a 1.4 \ufffd solvent probe, while the majority of solvated and polar cavities is retained. This suggests that the majority of empty cavities previously observed in protein structures might simply be artifacts of the surfacing method. We also find that the variable probe surface can have significant effects on electrostatic calculations by generating a better tuned description of the protein-water boundary. We also examined the binding interfaces of a diverse set of 55 protein-protein complexes. We find that using a variable probe results in an increase in perceived shape complementarity at these sites compared to using a 1.4 \ufffd solvent probe. The molecular volume and surface area are geometric values that determine various important properties for macromolecules, and the altered description afforded by a variable solvent probe molecular surface can have significant implications in protein recognition, energetics, folding, and stability calculations. Proteins 2006. \ufffd 2005 Wiley-Liss, IncCopyright \ufffd 2005 Wiley-Liss, IncNRC publication: Ye

    Coupled atomic charge selectivity for optimal ligand-charge distributions at protein binding sites

    No full text
    Charge optimization as a tool for both analyzing and enhancing binding electrostatics has become an attractive approach over the past few years. An interesting feature of this method for molecular design is that it provides not only the optimal charge magnitudes, but also the selectivity of a particular atomic center for its optimal charge. The current approach to compute the charge selectivity at a given atomic center of a ligand in a particular binding process is based on the binding-energy cost incurred upon the perturbation of the optimal charge distribution by a unit charge at the given atomic center, while keeping the other atomic partial charges at their optimal values. A limitation of this method is that it does not take into account the possible concerted changes in the other atomic charges that may incur a lower energetic cost than perturbing a single charge. Here, we describe a novel approach for characterizing charge selectivity in a concerted manner, taking into account the coupling between the ligand charge centers in the binding process. We apply this novel charge selectivity measure to the celecoxib molecule, a nonsteroidal anti-inflammatory agent binding to cyclooxygenase 2 (COX2), which has been recently shown to also exhibit cross-reactivity toward carbonic anhydrase II (CAII), to which it binds with nanomolar affinity. The uncoupled and coupled charge selectivity profiles over the atomic centers of the celecoxib ligand, binding independently to COX2 and CAII, are analyzed comparatively and rationalized with respect to available experimental data. Very different charge selectivity profiles are obtained for the uncoupled versus coupled selectivity calculations. (c) 2006 Wiley Periodicals, Inc. J Comput Chem, 2006NRC publication: Ye

    Coupled atomic charge selectivity for optimal ligand-charge distributions at protein binding sites

    Get PDF
    Charge optimization as a tool for both analyzing and enhancing binding electrostatics has become an attractive approach over the past few years. An interesting feature of this method for molecular design is that it provides not only the optimal charge magnitudes, but also the selectivity of a particular atomic center for its optimal charge. The current approach to compute the charge selectivity at a given atomic center of a ligand in a particular binding process is based on the binding-energy cost incurred upon the perturbation of the optimal charge distribution by a unit charge at the given atomic center, while keeping the other atomic partial charges at their optimal values. A limitation of this method is that it does not take into account the possible concerted changes in the other atomic charges that may incur a lower energetic cost than perturbing a single charge. Here, we describe a novel approach for characterizing charge selectivity in a concerted manner, taking into account the coupling between the ligand charge centers in the binding process. We apply this novel charge selectivity measure to the celecoxib molecule, a nonsteroidal anti-inflammatory agent binding to cyclooxygenase 2 (COX2), which has been recently shown to also exhibit cross-reactivity toward carbonic anhydrase II (CAII), to which it binds with nanomolar affinity. The uncoupled and coupled charge selectivity profiles over the atomic centers of the celecoxib ligand, binding independently to COX2 and CAII, are analyzed comparatively and rationalized with respect to available experimental data. Very different charge selectivity profiles are obtained for the uncoupled versus coupled selectivity calculations. (c) 2006 Wiley Periodicals, Inc. J Comput Chem, 2006NRC publication: Ye

    FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols using Active Learning

    No full text
    Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ~1 kcal mol-1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the a) relatively large parameter space to be explored, b) significant compute requirements, and c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses active learning to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active learning process, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology

    FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols Using Active Learning

    No full text
    Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small-molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ∼1 kcal mol–1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the (a) relatively large parameter space to be explored, (b) significant compute requirements, and (c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses an active-learning workflow to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active-learning workflow, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology

    Crystal Structure of StaL, a Glycopeptide Antibiotic Sulfotransferase from Streptomyces toyocaensis

    Get PDF
    Over the past decade, antimicrobial resistance has emerged as a major public health crisis. Glycopeptide antibiotics such as vanco-mycin and teicoplanin are clinically important for the treatment of Gram-positive bacterial infections. StaL is a 3'-phosphoadenosine 5'-phosphosulfate-dependent sulfotransferase capable of sulfating the cross-linked heptapeptide substrate both in vivo and in vitro, yielding the product A47934, a unique teicoplanin-class glycopeptide antibiotic. The sulfonation reaction catalyzed by StaL constitutes the final step in A47934 biosynthesis. Here we report the crystal structure of StaL and its complex with the cofactor product 3'-phosphoadenosine 5'-phosphate. This is only the second prokaryotic sulfotransferase to be structurally characterized. StaL belongs to the large sulfotransferase family and shows higher similarity to cytosolic sulfotransferases (ST) than to the bacterial ST (Stf0). StaL has a novel dimerization motif, different from any other STs that have been structurally characterized. We have also applied molecular modeling to investigate the binding mode of the unique substrate, desulfo-A47934. Based on the structural analysis and modeling results, a series of residues was mutated and kinetically characterized. In addition to the conserved residues (Lys(12), His(67), and Ser(98)), molecular modeling, fluorescence quenching experiments, and mutagenesis studies identified several other residues essential for substrate binding and/or activity, including Trp(34), His(43), Phe(77), Trp(132), and Glu(205)NRC publication: Ye

    Combining Cloud-Based Free Energy Calculations, Synthetically Aware Enumerations and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization

    No full text
    The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying 50 50 <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br /

    Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

    No full text
    We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns
    corecore