18 research outputs found
Recommended from our members
Modeling the vibrational couplings of nucleobases.
Vibrational spectroscopy, in particular infrared spectroscopy, has been widely used to probe the three-dimensional structures and conformational dynamics of nucleic acids. As commonly used chromophores, the C=O and C=C stretch modes in the nucleobases exhibit distinct spectral features for different base pairing and stacking configurations. To elucidate the origin of their structural sensitivity, in this work, we develop transition charge coupling (TCC) models that allow one to efficiently calculate the interactions or couplings between the C=O and C=C chromophores based on the geometric arrangements of the nucleobases. To evaluate their performances, we apply the TCC models to DNA and RNA oligonucleotides with a variety of secondary and tertiary structures and demonstrate that the predicted couplings are in quantitative agreement with the reference values. We further elucidate how the interactions between the paired and stacked bases give rise to characteristic IR absorption peaks and show that the TCC models provide more reliable predictions of the coupling constants as compared to the transition dipole coupling scheme. The TCC models, together with our recently developed through-bond coupling constants and vibrational frequency maps, provide an effective theoretical strategy to model the vibrational Hamiltonian, and hence the vibrational spectra of nucleic acids in the base carbonyl stretch region directly from atomistic molecular simulations
Recommended from our members
Data-driven enzyme engineering to identify function-enhancing enzymes.
Identifying function-enhancing enzyme variants is a holy grail challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications
EnzyHTP: A High-Throughput Computational Platform for Enzyme Modeling
Molecular simulations, including quantum mechanics (QM), molecular mechanics (MM), and multiscale QM/MM modeling, have been extensively applied to understand the mechanism of enzyme catalysis and to design new enzymes. However, molecular simulations typically require specialized, manual operation ranging from model construction to post-analysis to complete the entire life-cycle of enzyme modeling. The dependence on manual operation makes it challenging to simulate enzymes and enzyme variants in a high-throughput fashion. In this work, we developed a Python software, EnzyHTP, to automate molecular model construction, QM, MM, and QM/MM computation, and analyses of modeling data for enzyme simulations. To test the EnzyHTP, we used fluoroacetate dehalogenase (FAcD) as a model system and simulated the enzyme interior electrostatics for 100 FAcD mutants with a random single amino acid substitution. For each enzyme mutant, the workflow involves structural model construction, 1 ns molecular dynamics simulations, and quantum mechnical calculations in 100 MD-sampled snapshots. The entire simulation workflow for 100 mutants was completed in 7 hours with 10 GPUs and 160 CPUs. EnzyHTP is expected to improve the efficiency and reproducibility of computational enzyme, facilitate the fundamental understanding of catalytic origins across enzyme families, and accelerate the optimization of biocatalysts for non-native substrate transformation
Recommended from our members
Convergence in determining enzyme functional descriptors across Kemp eliminase variants.
Molecular simulations have been extensively employed to accelerate biocatalytic discoveries. Enzyme functional descriptors derived from molecular simulations have been leveraged to guide the search for beneficial enzyme mutants. However, the ideal active-site region size for computing the descriptors over multiple enzyme variants remains untested. Here, we conducted convergence tests for dynamics-derived and electrostatic descriptors on 18 Kemp eliminase variants across six active-site regions with various boundary distances to the substrate. The tested descriptors include the root-mean-square deviation of the active-site region, the solvent accessible surface area ratio between the substrate and active site, and the projection of the electric field (EF) on the breaking C-H bond. All descriptors were evaluated using molecular mechanics methods. To understand the effects of electronic structure, the EF was also evaluated using quantum mechanics/molecular mechanics methods. The descriptor values were computed for 18 Kemp eliminase variants. Spearman correlation matrices were used to determine the region size condition under which further expansion of the region boundary does not substantially change the ranking of descriptor values. We observed that protein dynamics-derived descriptors, including RMSDactive_site and SASAratio, converge at a distance cutoff of 5 Ã… from the substrate. The electrostatic descriptor, EFC-H, converges at 6 Ã… using molecular mechanics methods with truncated enzyme models and 4 Ã… using quantum mechanics/molecular mechanics methods with whole enzyme model. This study serves as a future reference to determine descriptors for predictive modeling of enzyme engineering
Recommended from our members
EnzyKR: a chirality-aware deep learning model for predicting the outcomes of the hydrolase-catalyzed kinetic resolution.
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions
Convergence in Determining Enzyme Functional Descriptors across Kemp Eliminase Variants
Molecular simulations have been extensively employed to accelerate biocatalytic discoveries. Enzyme functional descriptors derived from molecular simulations have been leveraged to guide the search for beneficial enzyme mutants. However, the ideal active-site region size for computing the descriptors over multiple enzyme variants remains untested. Here, we conducted convergence tests for dynamics-derived and electrostatic descriptors on eighteen Kemp eliminase variants across six active-site regions with various boundary distances to the substrate. The tested descriptors include the root-mean-square deviation of the active-site region, the solvent accessible surface area ratio between the substrate and active site, and the projection of the electric field on the breaking C–H bond. All descriptors were evaluated using molecular mechanics methods. To understand the effects of electronic structure, the electric field was also evaluated using quantum mechanics/molecular mechanics methods. The descriptor values were computed for eighteen Kemp eliminase variants. Spearman correlation matrices were used to determine the region size condition under which further expansion of the region boundary does not substantially change the ranking of descriptor values. We observed that protein dynamics-derived descriptors, including RMSDactive_site and SASAratio, converge at a distance cutoff of 5 Å from the substrate. The electrostatic descriptor, EFC–H, converges at 6 Å using molecular mechanics methods with truncated enzyme models and 4 Å using quantum mechanics/molecular mechanics methods with whole enzyme model. This study serves as a future reference to determine descriptors for predictive modeling of enzyme engineering
EnzyKR: A Chirality-Aware Deep Learning Model for Predicting the Outcomes of the Hydrolase-Catalyzed Kinetic Resolution
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict the favorable enzyme scaffolds for separating racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of an enantiomer-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequence and substrate SMILES strings, EnzyKR was trained using 204 enantiomer-hydrolase complexes, which were constructed by docking based on the enzyme and substrate structures curated from IntEnzyDB. EnzyKR was tested using a held-out dataset of 20 complexes on the task of active free energy prediction. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal/mol in its active free energy prediction task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR as a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions
Investigating the Non-Electrostatic Component of Substrate Positioning Dynamics
Substrate positioning dynamics (SPD) orients the substrate to reactive conformations in the active site, accelerating enzymatic reactions. However, it remains unknown whether SPD effects originate primarily from electrostatic perturbation inside the enzyme or can independently mediate catalysis with a significant non-electrostatic component. Here we investigated how the non-electrostatic component of SPD affects transition state stabilization. Using high-throughput enzyme modeling, we selected Kemp eliminase variants with similar electrostatics inside the enzyme but significantly different SPD. The kinetic parameters of these selected mutants were experimentally characterized. We observed a valley-shaped, two-segment linear correlation between the TS stabilization free energy (converted from kinetic parameters) and an index used to quantify SPD. Favorable SPD was observed for a distal mutant R154W, leading to the lowest activation free energy among the mutants tested. R154W involves an increased proportion of reactive conformations. These results indicate the contribution of the non-electrostatic component of SPD to mediating enzyme catalytic efficiency
LassoHTP: a High-throughput Computational Tool for Lasso Peptide Structure Construction and Modeling
Lasso peptides are a sub-class of ribosomally synthesized and post-translationally modified peptides with a slipknot conformation. Often with superior thermal stability, protease resistance, and antimicrobial activity, lasso peptides are promising candidates for bioengineering and pharmseutical applications. To enable high-throughput computational prediction and design of lasso peptides, we developed software, LassoHTP, for automatic lasso peptide structure construction and modeling. LassoHTP consists of three modules, including: scaffold constructor, mutant generator, and molecular dynamics (MD) simulator. Based on a user-provided sequence and conformational annotation, LassoHTP can either generate the structure and conformational ensemble as is or conduct random mutagenesis. We used LassoHTP to construct eight known lasso peptide structures de novo and to simulate their conformational ensembles from 100 ns MD simulations. For benchmarking, we calculated the root mean square deviation (RMSD) of these ensembles with reference to their experimental crystal or NMR PDB structures; we also compared these RMSD values against those of the MD ensembles that are initiated from the PDB structures. The results show that the RMSD values of the LassoHTP-initiated ensembles are highly similar to those of the PDB-initiated ensembles with the ∆RMSD ranging from 0.0 to 1.2 Å and averaging at 0.5 Å. LassoHTP offers a computational platform to develop strategies for lasso peptide prediction and design
Rate-enhancing Single Amino Acid Mutation for Hydrolases: A Statistical Profiling
We reported the statistical profiling for rate-enhancing mutant hydrolases with
single amino acid substitution. We constructed an integrated structure-kinetics database,
IntEnzyDB, which contains 3,907 experimentally characterized hydrolase kinetics and 2,715
hydrolase Protein Data Bank IDs. The hydrolase kinetics data involve 9% rate-enhancing
mutations. Mutation to nonpolar residues with a hydrocarbon chain shows a stronger preference
for rate acceleration than to polar or charged residues. To elucidate the structure-kinetics
relationship for rate-enhancing mutations, we categorized each mutation into one of the three
spatial shells of hydrolases. We defined the spatial shells by reference to either the active site or
the center-of-mass of the enzyme. In either case, mutations in the first shell (i.e., closest to the
reference point) appear on average more rate-deleterious than those in the other two shells (i.e.,
~1.0 kcal/mol in ∆∆G‡
). Under the active-site reference, mutations in the third shell (i.e., most
distal to the active site) exhibit the highest likelihood of rate enhancement. This propensity is
significant for larger-sized hydrolases. In contrast, under the center-of-mass reference, mutations
in the second shell (i.e., 33.3th to 66.7th percentile rank of spatial proximity to the center-of-mass
of the enzyme) show the highest likelihood of rate enhancement. This trend is significant for
smaller-sized hydrolases. The studies reveal the statistical features for identifying rate-enhancing
mutations in hydrolases, which will potentially guide hydrolase discovery in biocatalysis