21 research outputs found
Quantitative Assessment of Molecular Dynamics Sampling for Flexible Systems
Molecular
dynamics (MD) simulation is a natural method for the
study of flexible molecules but at the same time is limited by the
large size of the conformational space of these molecules. We ask
by how much the MD sampling quality for flexible molecules can be
improved by two means: the use of diverse sets of trajectories starting
from different initial conformations to detect deviations between
samples and sampling with enhanced methods such as accelerated MD
(aMD) or scaled MD (sMD) that distort the energy landscape in controlled
ways. To this end, we test the effects of these approaches on MD simulations
of two flexible biomolecules in aqueous solution, Met-Enkephalin (5
amino acids) and HIV-1 gp120 V3 (a cycle of 35 amino acids). We assess
the convergence of the sampling quantitatively with known, extensive
measures of cluster number <i>N</i><sub>c</sub> and cluster
distribution entropy <i>S</i><sub>c</sub> and with two new
quantities, conformational overlap <i>O</i><sub>conf</sub> and density overlap <i>O</i><sub>dens</sub>, both conveniently
ranging from 0 to 1. These new overlap measures quantify self-consistency
of sampling in multitrajectory MD experiments, a necessary condition
for converged sampling. A comprehensive assessment of sampling quality
of MD experiments identifies the combination of diverse trajectory
sets and aMD as the most efficient approach among those tested. However,
analysis of <i>O</i><sub>dens</sub> between conventional
and aMD trajectories also reveals that we have not completely corrected
aMD sampling for the distorted energy landscape. Moreover, for V3,
the courses of <i>N</i><sub>c</sub> and <i>O</i><sub>dens</sub> indicate that much higher resources than those generally
invested today will probably be needed to achieve convergence. The
comparative analysis also shows that conventional MD simulations with
insufficient sampling can be easily misinterpreted as being converged
Comparison of statistical indicators of association.
<p>200 random contingency tables with total count <i>N</i> = 100, a typical order of magnitude for analyses of sequence-feature association in practice, are analyzed by Fisher’s exact test, yielding <i>p</i> values for the rejection of independence (horizontal axis, not corrected for multiple testing), and by four different BF models, namely <i>K</i> = 1, <i>K</i> = 100, <i>K</i><sub><i>D</i></sub>, and uniform model, with corresponding BFs on vertical axis. Solid horizontal black line at <i>BF</i> = 1 and dashed vertical line at <i>p</i> = 0.05 for orientation.</p
Odds-ratio plot and Tartan plot for visualization of statistical associations.
<p><b>A</b> Odds-ratio plot, based on an alignment of region of HIV-1 gp120 around the V3 loop (C296-C331). Here, the feature is the predicted co-receptor tropism of HIV-1 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0146409#pone.0146409.ref017" target="_blank">17</a>] (R5 vs. X4 tropic). Bar heights and colors indicate logarithms of odds ratios and negative logarithms of <i>p</i> values, respectively. A reference sequence and sequence positions can be added in the top and bottom rows for orientation. <b>B</b> Tartan plot for the synopsis of two alignment pair association measures, here: −log <i>p</i> from association test between alignment position pairs (upper right triangle) vs. Direct Information between these pairs (lower left triangle). Association strengths are color coded (color legend on the right). For orientation, axes can be annotated and sequence substructures can be indicated by lines.</p
Comparison of frequentist approach and Bayes factors (BF).
<p>Discovery of association of alignment positions of HBV core proteins with patient HLA types, here: A*01 (top row) and B*44 (bottom row). Sequence numbers in panel titles are feature-carrying fractions of the total of 148 sequences included in the alignment. Association of sequences with feature HLA were analyzed by Fisher’s exact test (panels A, D), BF with <i>K</i> = 1 (panels B, E), and BF with <i>K</i><sub><i>D</i></sub> (panels C, F). Alignment positions with association above certain thresholds (horizontal dashed lines) are marked by red stars and vertical dashed lines, namely <i>p</i> < 0.01 (A, D), or <i>BF</i> > 10 (B, C, E, F). The <i>p</i> values and BFs shown are the best for each alignment position (lowest <i>p</i> values, highest <i>BF</i>s).</p
Phylogenetic distribution of feature-carrying sequences and phylogenetic bias indicator <i>B</i>.
<p>The distance-based phylogenetic tree in all six panels was computed for the same set of 788 East Asian HIV-1 gag protein sequences obtained from the HIV sequence database at <a href="http://www.hiv.lanl.gov" target="_blank">http://www.hiv.lanl.gov</a>. In each panel, those branches are colored red that correspond to sequences that carry an amino acid substitution apparently associated with a certain HLA type. The numbers to the upper right of each tree are the corresponding values of the bias indicator <i>B</i>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0146409#pone.0146409.e005" target="_blank">Eq (4)</a>.</p
Broken stick distribution (solid line) and NRADs of <i>IgG</i><sup>+</sup><i>CD</i>27<sup>+</sup> fractions (points).
<p>Inset: section of hierarchical clustering dendrogram where broken stick distribution appears. This plot adopts the usual presentation of the broken stick distribution in the literature with linear horizontal axis and logarithmic vertical axis. Therefore the boomerang shapes of the log-log <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.g004" target="_blank">Fig 4</a> appear horizontally stretched.</p
Averaged NRADs of gut microbiome data in six age groups.
<p>The number of NRADs per group from youngest to oldest were 9, 18, 55, 64, 34, and 309, respectively. Solid lines are mean NRADs, shaded areas are 90% confidence intervals for the means.</p
Robustness of NRADs against varying sampling depth.
<p>(A) original RAD of first sample of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.ref031" target="_blank">31</a>] (black) and down-sampled RAD (red). (B) the two NRADs obtained by MaxRank normalization to <i>R</i> = 1000 of the RADs in panel A are almost indistinguishable. (C) comparison of NRAD distances of the first 50 samples of the data set of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.ref031" target="_blank">31</a>]. Left violin plot: density of distances between NRADs computed by MaxRank normalization to <i>R</i> = 1000 of the original RADs; middle violin plot: same for down-sampled RADs; right violin plot: distances between corresponding original and down-sampled NRADs. The biologically meaningful NRAD distance distributions are robust against differences in sample size (left and middle violin). In comparison, the distances related to differences in sample size are negligible (right violin).</p
Diversity of the <i>V</i><sub><i>H</i></sub> region of BCRs.
<p>(A) The human genome contains sets of <i>V</i><sub><i>H</i></sub>, <i>D</i><sub><i>H</i></sub>, and <i>J</i><sub><i>H</i></sub> gene segments. (B) The “variable” <i>V</i><sub><i>H</i></sub> segments can be grouped into seven <i>V</i><sub><i>H</i></sub> families based on sequence similarity. (C) A genetically diverse pool of B cells is generated by V(D)J recombination. (D) Exposure to antigens induces an adaptation of the BCR repertoire, generating genetic variants and changing the usage pattern of <i>V</i><sub><i>H</i></sub> gene segments.</p
General process employed in this work.
<p>Flowchart of procedure from original species/abundances or sequence/reads data (top box) to original RADs, then to NRADs, and analyses based on NRADs.</p