11 research outputs found

    Modeling Disordered Regions in Proteins Using Rosetta

    Get PDF
    Protein structure prediction methods such as Rosetta search for the lowest energy conformation of the polypeptide chain. However, the experimentally observed native state is at a minimum of the free energy, rather than the energy. The neglect of the missing configurational entropy contribution to the free energy can be partially justified by the assumption that the entropies of alternative folded states, while very much less than unfolded states, are not too different from one another, and hence can be to a first approximation neglected when searching for the lowest free energy state. The shortcomings of current structure prediction methods may be due in part to the breakdown of this assumption. Particularly problematic are proteins with significant disordered regions which do not populate single low energy conformations even in the native state. We describe two approaches within the Rosetta structure modeling methodology for treating such regions. The first does not require advance knowledge of the regions likely to be disordered; instead these are identified by minimizing a simple free energy function used previously to model protein folding landscapes and transition states. In this model, residues can be either completely ordered or completely disordered; they are considered disordered if the gain in entropy outweighs the loss of favorable energetic interactions with the rest of the protein chain. The second approach requires identification in advance of the disordered regions either from sequence alone using for example the DISOPRED server or from experimental data such as NMR chemical shifts. During Rosetta structure prediction calculations the disordered regions make only unfavorable repulsive contributions to the total energy. We find that the second approach has greater practical utility and illustrate this with examples from de novo structure prediction, NMR structure calculation, and comparative modeling

    Chromatin landscape of the "dark matter of the genome": centromeres of S. cerevisiae and repeat sequences of D. melanogaster.

    No full text
    Thesis (Ph.D.)--University of Washington, 2014The chromatin landscape plays a major role in defining cell phenotypes through transcriptional regulation and specification of the main features of chromosome, such as centromeres and pericentric heterochromatin. Studies of the chromatin landscape so far have been mostly confined to the protein-coding part of the genome. In this work I present a study of the chromatin landscape of two non-coding regions: centromeres in budding yeast Saccharomyces cerevisiae and pericentric repeat sequences of the fruit fly Drosophila melanogaster. I have shown that the centromere of budding yeast contains a nucleosome with a special structure, called a hemisome. This finding eliminated other previously proposed models of the centromeric nucleosome and reconciled previous conflicting observations. I also developed a method to quantify enrichment of repeat sequences in Chip-Seq experiments and used it to construct an epigenetic map of heterochromatin in D. melanogaster using public datasets Drosophila Genetic Reference Panel (DGRP) and modENCODE. This analysis yielded several unexpected biologically interesting findings such as preferential association of HP1a protein with transposable elements and depletion of nucleosomes from AT-rich short repeats sequences

    Comparison of energy versus rmsd and free energy versus rmsd plots for case with disordered internal loop (2k0J).

    No full text
    <p>A) Rosetta all atom energy and B) free energy computed using Eq. (1) with predicted disordered regions (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022060#pone-0022060-g003" target="_blank">Fig 3B</a>-2k0j). The energy shown in A is calculated using the Rosetta all-atom energy. In A and B, the x-axis is the RMSD to the folded portion of the native structure. The 10 lowest energy/free energy decoys are shown in black. The dashed orange lines are provided to aid comparison of the two plots. (<b>C)</b>. Compensation between the entropic and energetic contributions to the free energy (Eq. (1)).</p

    Results of disordered internal loop predictions.

    No full text
    <p>(<b>A</b>) Comparisons of prediction accuracy using the free energy function with optimized parameters (<i>β</i> = 1.5 and <i>L<sub>0</sub></i> = 0.3) with that of a null model. The y-axis shows disorder prediction accuracy over the benchmark set using Eq. (2). The x-axis shows the prediction of the null model, which assumes all residues are ordered. (<b>B</b>) Examples of successful prediction of disordered internal loops. Blue line: the actual disordered regions assessed from the residue deviations in the NMR structure. Red line: frequency of disorder assignment by optimization of Eq. (1) over decoy population.</p

    Test cases for 2<sup>nd</sup> approach.

    No full text
    a<p>Residues predicted to be disordered are shown in bold font.</p>b<p>Assumed from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022060#pone-0022060-t001" target="_blank">Table 1</a>, tails of 1enh are constructed based on the gene sequence recovered from the gene sequence, in which we assumed these regions likely to be disordered, and was mostly consistent with the prediction results using the DISOPRED2.</p>c<p>http://bioinf.cs.ucl.ac.uk/disopred/ <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022060#pone.0022060-Ward1" target="_blank">[8]</a>.</p>d<p>Disordered regions were predicted using “Predicted order parameter (S<sup>2</sup>)” calculated from backbone chemical shifts data with BMRB accession number 6571 <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022060#pone.0022060-Berjanskii1" target="_blank">[11]</a>.</p>e<p>This is the target T0460 in CASP8 directly downloaded from <a href="http://predictioncenter.org/download_area/CASP8/targets/" target="_blank">http://predictioncenter.org/download_area/CASP8/targets/</a>.</p>f<p>The same method as described on <sup>d</sup> with BMRB accession number 15805.</p>g<p>This is the target T0482 in CASP8 directly downloaded from <a href="http://predictioncenter.org/download_area/CASP8/targets/" target="_blank">http://predictioncenter.org/download_area/CASP8/targets/</a>.</p

    Modeling with REPLONLY residues.

    No full text
    a<p>GDT-TS was calculated only the folded portion of the native structure.</p>b<p>σ: Standard deviation.</p>c<p>Disordered regions were treated as REPLONLY.</p

    Results of disordered termini prediction.

    No full text
    <p>(<b>A</b>) Optimization of <i>E<sub>d</sub></i> value using 1ctf from the test set as a representative example. In panel 1 to 3, histograms show the accuracy of prediction results using representative of <i>E<sub>d</sub></i> values, where the x-axis shows the length difference of predicted and actual tail, and the y-axis shows the frequency of prediction. We show here the prediction results with <i>E<sub>d</sub></i> values of 1.4, 2.0 and 2.6 in panel 1, 2 and 3, respectively. With the <i>E<sub>d</sub></i> value of 2.0, the prediction shows the greatest accuracy, where the predicted length difference equals to zero (the prediction matches the actual length) with the highest frequency of 0.8 (maximum equals to 1). (<b>B</b>) Prediction of disordered terminal regions. Blue and red symbols represent N- and C- terminal tails, respectively. Different symbols corresponds to different test cases; the multiple instances of each symbol type represent the different tail lengths considered for a given test case.</p
    corecore