57 research outputs found
Efficient Biased Sampling Methods for Biomacromolecules: Protein Loops and RNA Thermodynamic Prediction
Biomacromolecules are fundamental structural and functional units of cell. Nucleic acids and proteins are the most common biomacromolecules. The relationship between sequences and structures of these biomacromolecules is one of the most important problem in biology for decades. Computational approaches can provide novel and efficient ways to study this problem and other structural-related problems, e.g. thermodynamics of nucleic acids. As nucleic acids and proteins are both biopolymers, sampling structures using chain-growth method with certain distribution is an effective approach to study the sequence-structure relationship of biomacromolecules. In this thesis, I develop a fast chain-growth method to efficiently predict protein loop conformations, and a coarse-grained chain-growth model to study thermodynamics of pseudoknotted RNA molecules. With an energy function designed specifically for loops, my method can efficiently generate high quality protein loop conformations with low energy that are enriched with near-native loop structures. I further applied this method to study multiple loop structures modeling problem as the interactions among loops in spatial proximity can be rather complex, and very few studies worked on this challenging problem. It shows better performance in accuracy compared to other methods. This method also succeeded in sampling and predicting conformations of antibody H3 loop while takes less computational time compared to other methods. For RNA pseudoknots, a coarse-grained chain-growth model is used to study the thermodynamics and folding stabilities of mouse mammary tumor virus pseudoknot – VPK. My results show that the melting temperature of VPK and its two subsequences can be correctly predicted. The melting temperature calculated from the heat capacity is in better agreement with the available experimental data than previous computational studies. My study also provides detailed information about the unfolding pathways of pseudoknots by analyzing the distribution of base pairing probability. The results favor the parallel melting pathway hypothesis of VPK folding over a simple sequential unfolding pathway. Overall, the above studies address two challenging problems of modeling three dimensional structures of proteins and RNAs, and have deepened our understanding of the relationship between sequences and structures of biomacromolecules
The time cost of energy calculations for generating one single loop.
<p>(A) The plot of computing time versus protein size show a large time saving of “Redcell-On” (red solid curve) compared to “Redcell-Off” (black dashed curve) for 12-residue loops, and (B) The plot of 6-residue loops. (C) Plot of computing time versus protein size show “Redcell-On” (red solid curve) has significantly improved computational time cost compared to “Ellipsoid-Only” (black dashed curve) and “Cutoff-Only” (green solid curve).</p
Mean of minimum backbone RMSD values for protein loops.
<p>We generated samples for each loop. The mean value of the minimum RMSD of the loops (-axis) is plotted against the size of trial samples (-axis) for different choices of . For control, results obtained without sampling torsion angles (, control) are also plotted. The backbone (N, , C and O atoms) RMSD in this paper is calculated by fixing the rest of the protein body.</p
Accuracy of modeled loops by DiSGro using the original Fiser data set of loops with 13 residues.
<p> and are the minimum backbone RMSD and the average backbone RMSD of the sampled conformations, respectively. and are the backbone and all heavy atoms RMSD of the lowest energy conformations in the ensemble.</p
Comparison of of the loop conformations sampled by DiSGro and six other methods using Test Set 2 used by Ref. [42].
<p> denote the average minimum backbone RMSD of the loop ensemble. Random Tweak, CCD, Wriggling, PLOP-build, Direct Tweak and results were obtained from Table 2 of Ref. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003539#pcbi.1003539-Soto1" target="_blank">[42]</a>.</p
Schematic illustration of placing and atoms.
<p>Atom has to be on the circle . The position of the atom of residue is determined by , which is based on known distance and the conditional distribution of . Once is sampled, can be placed on two positions with equal probabilities. Here is the selected position of . (yellow ball) is placed at the position alternative to . Similarly, the atom has to be on the circle and its position is determined by in a similar fashion.</p
Schematic illustration of ellipsoid criterion.
<p>(A) Three dimensional view of a point locating on the ellipsoid constructed from the total loop length and the two foci and . (B) Two dimensional view along through the -axis of the ellipsoid, with and (dark gray). is along -axis, not shown. The maximum side-chain length is denoted as and the distance cut-off of interaction is . The enlarged ellipsoid, which has updated and , is also shown (light gray).</p
Top five lowest energy loops of length 12 for single-metal-substituted concanavalin A (pdb 1scs, residues 199–210).
<p>The lowest energy loop after side-chain construction is colored in red, and the native structure is in white.</p
Comparison of , and of the lowest energy conformations of the loops sampled by RAPPER, FALCm4 and DiSGro using Test Set .
<p>, and denote the average minimum backbone RMSD, the average ensemble RMSD and the average RMSD of the lowest energy conformations of the 1,000 loop ensemble with the same length, respectively.</p
Comparison of accuracy of modeled loops using the original Fiser data set of loops with – residues.
<p>The accuracy achieved by LOOPER and DiSGro at different loop length using the original Fiser data set of loops with 10–12 residues is listed. , and denote the mean and median of backbone RMSD, while , and denote the mean and median of all-heavy atoms RMSD of the lowest energy conformations with the same loop length.</p
- …