16 research outputs found
GalaxyGPCRloop: Template-Based and <i>Ab Initio</i> Structure Sampling of the Extracellular Loops of G‑Protein-Coupled Receptors
The
second extracellular loops (ECL2s) of G-protein-coupled receptors
(GPCRs) are often involved in GPCR functions, and their structures
have important implications in drug discovery. However, structure
prediction of ECL2 is difficult because of its long length and the
structural diversity among different GPCRs. In this study, a new ECL2
conformational sampling method involving both template-based and <i>ab initio</i> sampling was developed. Inspired by the observation
of similar ECL2 structures of closely related GPCRs, a template-based
sampling method employing loop structure templates selected from the
structure database was developed. A new metric for evaluating similarity
of the target loop to templates was introduced for template selection.
An <i>ab initio</i> loop sampling method was also developed
to treat cases without highly similar templates. The <i>ab initio</i> method is based on the previously developed fragment assembly and
loop closure method. A new sampling component that takes advantage
of secondary structure prediction was added. In addition, a conserved
disulfide bridge restraining ECL2 conformation was predicted and analytically
incorporated into sampling, reducing the effective dimension of the
conformational search space. The sampling method was combined with
an existing energy function for comparison with previously reported
loop structure prediction methods, and the benchmark test demonstrated
outstanding performance
Protein Loop Modeling Using a New Hybrid Energy Function and Its Application to Modeling in Inaccurate Structural Environments
<div><p>Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and <i>de novo</i> protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at <a href="http://galaxy.seoklab.org/loop" target="_blank">http://galaxy.seoklab.org/loop</a> with the PS2 option for the scoring function.</p></div
Examples of loops modeled in inaccurate environmental structures.
<p>In all panels, the crystal structures are colored in green and the models in magenta. Framework structures are shown transparent for clarity. (A) Two examples of tolerating errors in surrounding side chains, 1oyc (left; RMSD = 0.4 Å) and 1c5e (right; RMSD = 0.5 Å). The loop-framework salt bridges in the crystal structures are indicated with black dotted lines. High-accuracy modeling is possible even though the salt bridges cannot be recovered owing to the perturbed arginine orientations in the framework. (B) An example of unsuccessful modeling in the framework of perturbed side-chains, 1oth (RMSD = 2.3 Å), showing the necessity of additional sampling. The perturbed Arg66 and Tyr345 side chains (magenta) would clash with the two leucine residues in the loop if the crystal loop structure were to be placed. (C) Two examples of tolerating additional backbone errors, 1my7 (left; RMSD = 1.0 Å) and 1cb0 (right; RMSD = 0.9 Å). The overall backbone trace and key side-chain interactions are well reproduced.</p
A successful example of loop modeling in the framework of a template-based model.
<p>The crystal structure is colored in green and the model in magenta (1avk, RMSD = 1.5 Å). Framework structures are shown transparent for clarity. Loops of three templates (used for template-based modeling) are shown with yellow transparent ribbons for comparison.</p
Distributions of environmental errors for the three types of test sets employed in the study.
<p>(A) for the test set of crystal structures with perturbed side chains, (B) for the crystal structures with both backbone and side chains perturbed, and (C) for the template-based models. The gray curve behind the histogram represents an interpolation. The average E-RMSD values are 0.9 Å, 2.1 Å, and 2.8 Å for the side chain-perturbed set (A), the backbone-perturbed set (B), and the template-based model set (C), respectively. E-RMSD represents the all-atom RMSD of environment residues for which any atoms are within 10 Å from any loop C<sub>β</sub> atoms.</p
Sampling results of GalaxyLoop-PS2 on the three test sets.
1)<p>Number of loop targets for which at least one structure among the 30 loop conformations (or 50 conformations for 12-residue loops) in the final CSA bank is within a given RMSD value.</p><p>Sampling results of GalaxyLoop-PS2 on the three test sets.</p
Comparison of loop modeling results on the test set of template-based models.
<p>The average RMSD and its standard deviation are reported in Å. The Loop RMSD is calculated as the root-mean-square deviation of the main-chain atoms N, C<sub>α</sub>, C, and O.</p>1)<p>Loop conformations generated by MODELLER <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Sali1" target="_blank">[30]</a>.</p>2)<p>Loop conformations generated by loop refinement using ModLoop of MODELLER <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Fiser1" target="_blank">[1]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Fiser2" target="_blank">[27]</a>.</p>3)<p>Results of the best-score models sampled by Next-generation KIC (NGK) using the protocol provided by Stein <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Stein1" target="_blank">[18]</a>.</p><p>500 models were generated for each target as in Stein <i>et al.</i> The Rosetta program v3.5 was used.</p>4)<p>Loop set constructed in this study. See <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s008" target="_blank">Table S7</a></b> for the list of loops.</p><p>Comparison of loop modeling results on the test set of template-based models.</p
Protein Ensemble Generation Through Variational Autoencoder Latent Space Sampling
Mapping the ensemble of protein conformations that contribute
to
function and can be targeted by small molecule drugs remains an outstanding
challenge. Here, we explore the use of variational autoencoders for
reducing the challenge of dimensionality in the protein structure
ensemble generation problem. We convert high-dimensional protein structural
data into a continuous, low-dimensional representation, carry out
a search in this space guided by a structure quality metric, and then
use RoseTTAFold guided by the sampled structural information to generate
3D structures. We use this approach to generate ensembles for the
cancer relevant protein K-Ras, train the VAE on a subset of the available
K-Ras crystal structures and MD simulation snapshots, and assess the
extent of sampling close to crystal structures withheld from training.
We find that our latent space sampling procedure rapidly generates
ensembles with high structural quality and is able to sample within
1 Ã… of held-out crystal structures, with a consistency higher
than that of MD simulation or AlphaFold2 prediction. The sampled structures
sufficiently recapitulate the cryptic pockets in the held-out K-Ras
structures to allow for small molecule docking
Comparison of loop modeling results by the average RMSD of main chain atoms (N, C<sub>α</sub>, C, and O) of loops in angstroms (Å) on test sets of varying environmental accuracies measured by E-RMSD.
<p>Standard deviations are also reported.</p>1)<p>Loop sampling methods sample only the loop region, while extended sampling methods sample surrounding side chains in addition to the loop.</p>2)<p>Taken from Sellers <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Sellers1" target="_blank">[15]</a>.</p>3)<p>Taken from Mandell <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Mandell1" target="_blank">[16]</a>.</p>4)<p>Results of the best-score models out of 500 models sampled for each target following the protocol provided by Stein <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Stein1" target="_blank">[18]</a> with Rosetta v3.5.</p><p>The results for the crystal structure set and the side chain-perturbed set are the same for NGK because extended sampling of loop environment was used for both sets.</p>5)<p>Loop sets taken from Jacobson <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Jacobson1" target="_blank">[10]</a>. See <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s002" target="_blank">Tables S1</a></b> and <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s003" target="_blank">S2</a></b> for the list of loops.</p>6)<p>Loop sets from Zhu <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Zhu1" target="_blank">[34]</a>. See <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s002" target="_blank">Tables S1</a></b> and <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s003" target="_blank">S2</a></b> for the list of loops.</p>7)<p>Loop set from Fiser <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811-Fiser1" target="_blank">[1]</a>. See <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0113811#pone.0113811.s004" target="_blank">Tables S3</a></b> for the list of loops.</p><p>Comparison of loop modeling results by the average RMSD of main chain atoms (N, C<sub>α</sub>, C, and O) of loops in angstroms (Å) on test sets of varying environmental accuracies measured by E-RMSD.</p
Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules
Most
biomolecular modeling energy functions for structure prediction,
sequence design, and molecular docking have been parametrized using
existing macromolecular structural data; this contrasts molecular
mechanics force fields which are largely optimized using small-molecule
data. In this study, we describe an integrated method that enables
optimization of a biomolecular modeling energy function simultaneously
against small-molecule thermodynamic data and high-resolution macromolecular
structural data. We use this approach to develop a next-generation
Rosetta energy function that utilizes a new anisotropic implicit solvation
model, and an improved electrostatics and Lennard-Jones model, illustrating
how energy functions can be considerably improved in their ability
to describe large-scale energy landscapes by incorporating both small-molecule
and macromolecule data. The energy function improves performance in
a wide range of protein structure prediction challenges, including
monomeric structure prediction, protein–protein and protein–ligand
docking, protein sequence design, and prediction of the free energy
changes by mutation, while reasonably recapitulating small-molecule
thermodynamic properties