28 research outputs found
EnzyHTP Computational Directed Evolution with Adaptive Resource Allocation
Directed evolution facilitates enzyme engineering via iterative rounds of mutagenesis. Despite the wide applications of high-throughput screening, building âsmart librariesâ to effectively identify beneficial variants remains a major challenge in the community. Here, we developed a new computational directed evolution protocol based on EnzyHTP, a software we have previously reported to automate enzyme modeling. To enhance the throughput efficiency, we implemented an adaptive resource allocation strategy that dynamically allocates different types of computing resources (e.g., GPU/CPU) based on the specific need of an enzyme modeling sub-task in the workflow. We implemented the strategy as a Python library and tested the library using fluoroacetate dehalogenase as a model enzyme. The results show that comparing to fixed resource allocation where both CPU and GPU are on-call for use during the entire workflow, applying adaptive resource allocation can save 87% CPU hours and 14% GPU hours. Furthermore, we constructed a computational directed evolution protocol under the framework of adaptive resource allocation. The workflow was tested against two rounds of mutational screening in the directed evolution experiments of Kemp eliminase with a total of 184 mutants. Using folding stability and electrostatic stabilization energy as computational readout, we reproduced three out of the four experimentally-observed target variants. Enabled by the workflow, the entire computation task (i.e., 18.4 Îźs MD and 18,400 QM single point calculations) completes in three days of wall clock time using ~30 GPUs and ~1000 CPUs
EnzyHTP: A High-Throughput Computational Platform for Enzyme Modeling
Molecular simulations, including quantum mechanics (QM), molecular mechanics (MM), and multiscale QM/MM modeling, have been extensively applied to understand the mechanism of enzyme catalysis and to design new enzymes. However, molecular simulations typically require specialized, manual operation ranging from model construction to post-analysis to complete the entire life-cycle of enzyme modeling. The dependence on manual operation makes it challenging to simulate enzymes and enzyme variants in a high-throughput fashion. In this work, we developed a Python software, EnzyHTP, to automate molecular model construction, QM, MM, and QM/MM computation, and analyses of modeling data for enzyme simulations. To test the EnzyHTP, we used fluoroacetate dehalogenase (FAcD) as a model system and simulated the enzyme interior electrostatics for 100 FAcD mutants with a random single amino acid substitution. For each enzyme mutant, the workflow involves structural model construction, 1 ns molecular dynamics simulations, and quantum mechnical calculations in 100 MD-sampled snapshots. The entire simulation workflow for 100 mutants was completed in 7 hours with 10 GPUs and 160 CPUs. EnzyHTP is expected to improve the efficiency and reproducibility of computational enzyme, facilitate the fundamental understanding of catalytic origins across enzyme families, and accelerate the optimization of biocatalysts for non-native substrate transformation
Recommended from our members
Convergence in determining enzyme functional descriptors across Kemp eliminase variants.
Molecular simulations have been extensively employed to accelerate biocatalytic discoveries. Enzyme functional descriptors derived from molecular simulations have been leveraged to guide the search for beneficial enzyme mutants. However, the ideal active-site region size for computing the descriptors over multiple enzyme variants remains untested. Here, we conducted convergence tests for dynamics-derived and electrostatic descriptors on 18 Kemp eliminase variants across six active-site regions with various boundary distances to the substrate. The tested descriptors include the root-mean-square deviation of the active-site region, the solvent accessible surface area ratio between the substrate and active site, and the projection of the electric field (EF) on the breaking C-H bond. All descriptors were evaluated using molecular mechanics methods. To understand the effects of electronic structure, the EF was also evaluated using quantum mechanics/molecular mechanics methods. The descriptor values were computed for 18 Kemp eliminase variants. Spearman correlation matrices were used to determine the region size condition under which further expansion of the region boundary does not substantially change the ranking of descriptor values. We observed that protein dynamics-derived descriptors, including RMSDactive_site and SASAratio, converge at a distance cutoff of 5 Ă
from the substrate. The electrostatic descriptor, EFC-H, converges at 6 Ă
using molecular mechanics methods with truncated enzyme models and 4 Ă
using quantum mechanics/molecular mechanics methods with whole enzyme model. This study serves as a future reference to determine descriptors for predictive modeling of enzyme engineering
Recommended from our members
EnzyKR: a chirality-aware deep learning model for predicting the outcomes of the hydrolase-catalyzed kinetic resolution.
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions
Convergence in Determining Enzyme Functional Descriptors across Kemp Eliminase Variants
Molecular simulations have been extensively employed to accelerate biocatalytic discoveries. Enzyme functional descriptors derived from molecular simulations have been leveraged to guide the search for beneficial enzyme mutants. However, the ideal active-site region size for computing the descriptors over multiple enzyme variants remains untested. Here, we conducted convergence tests for dynamics-derived and electrostatic descriptors on eighteen Kemp eliminase variants across six active-site regions with various boundary distances to the substrate. The tested descriptors include the root-mean-square deviation of the active-site region, the solvent accessible surface area ratio between the substrate and active site, and the projection of the electric field on the breaking CâH bond. All descriptors were evaluated using molecular mechanics methods. To understand the effects of electronic structure, the electric field was also evaluated using quantum mechanics/molecular mechanics methods. The descriptor values were computed for eighteen Kemp eliminase variants. Spearman correlation matrices were used to determine the region size condition under which further expansion of the region boundary does not substantially change the ranking of descriptor values. We observed that protein dynamics-derived descriptors, including RMSDactive_site and SASAratio, converge at a distance cutoff of 5 Ă
from the substrate. The electrostatic descriptor, EFCâH, converges at 6 Ă
using molecular mechanics methods with truncated enzyme models and 4 Ă
using quantum mechanics/molecular mechanics methods with whole enzyme model. This study serves as a future reference to determine descriptors for predictive modeling of enzyme engineering
EnzyKR: A Chirality-Aware Deep Learning Model for Predicting the Outcomes of the Hydrolase-Catalyzed Kinetic Resolution
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict the favorable enzyme scaffolds for separating racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of an enantiomer-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequence and substrate SMILES strings, EnzyKR was trained using 204 enantiomer-hydrolase complexes, which were constructed by docking based on the enzyme and substrate structures curated from IntEnzyDB. EnzyKR was tested using a held-out dataset of 20 complexes on the task of active free energy prediction. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal/mol in its active free energy prediction task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR as a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions
Recommended from our members
Why â˘CF2H is nucleophilic but â˘CF3 is electrophilic in reactions with heterocycles.
Radical substitution is a useful method to functionalize heterocycles, as in the venerable Minisci reaction. Empirically observed regiochemistries indicate that the CF2H radical has a nucleophilic character similar to alkyl radicals, but the CF3 radical is electrophilic. While the difference between â˘CH3 and â˘CF3 is well understood, the reason that one and two Fs make little difference but the third has a large effect is puzzling. DFT calculations with M06-2X both reproduce experimental selectivities and also lead to an explanation of this difference. Theoretical methods reveal how the F inductive withdrawal and conjugative donation alter radical properties, but only CF3 becomes decidedly electrophilic toward heterocycles. Here, we show a simple model to explain the radical orbital energy trends and resulting nucleophilicity or electrophilicity of fluorinated radicals
Recommended from our members
Bioinspired Synthesis of (â)âPFâ1018
The combination of electrocyclizations and cycloadditions accounts for the formation of a range of fascinating natural products. Cascades consisting of 8Ď electrocyclizations followed by a 6Ď electrocyclization and a cycloaddition are relatively common. We now report the synthesis of the tetramic acid PF-1018 through an 8Ď electrocyclization, the product of which is immediately intercepted by a Diels-Alder cycloaddition. The success of this pericyclic cascade was critically dependent on the substitution pattern of the starting polyene and could be rationalized through DFT calculations. The completion of the synthesis required the instalment of a trisubstituted double bond by radical deoxygenation. An unexpected side product formed through 4-exo-trig radical cyclization could be recycled through an unprecedented triflation/fragmentation
Investigating the Non-Electrostatic Component of Substrate Positioning Dynamics
Substrate positioning dynamics (SPD) orients the substrate to reactive conformations in the active site, accelerating enzymatic reactions. However, it remains unknown whether SPD effects originate primarily from electrostatic perturbation inside the enzyme or can independently mediate catalysis with a significant non-electrostatic component. Here we investigated how the non-electrostatic component of SPD affects transition state stabilization. Using high-throughput enzyme modeling, we selected Kemp eliminase variants with similar electrostatics inside the enzyme but significantly different SPD. The kinetic parameters of these selected mutants were experimentally characterized. We observed a valley-shaped, two-segment linear correlation between the TS stabilization free energy (converted from kinetic parameters) and an index used to quantify SPD. Favorable SPD was observed for a distal mutant R154W, leading to the lowest activation free energy among the mutants tested. R154W involves an increased proportion of reactive conformations. These results indicate the contribution of the non-electrostatic component of SPD to mediating enzyme catalytic efficiency
Bioinspired Synthesis of (â)âPFâ1018
The combination of electrocyclizations and cycloadditions accounts for the formation of a range of fascinating natural products. Cascades consisting of 8Ď electrocyclizations, followed by 6Ď electrocyclization, and a cycloaddition are relatively common. We now report the synthesis of the tetramic acid PF-1018 through an 8Ď electrocyclization, the product of which is immediately intercepted by a DielsâAlder cycloaddition. The success of this pericyclic cascade was critically dependent on the substitution pattern of the starting polyene and could be rationalized through DFD calculations. The completion of the synthesis required the instalment of a trisubstituted double bond via radical deoxygenation. An unexpected byproduct formed through 4-exo-trig radical cyclization could be recycled through an unprecedented triflation/fragmentation