21 research outputs found

    Quantitative Assessment of Molecular Dynamics Sampling for Flexible Systems

    No full text
    Molecular dynamics (MD) simulation is a natural method for the study of flexible molecules but at the same time is limited by the large size of the conformational space of these molecules. We ask by how much the MD sampling quality for flexible molecules can be improved by two means: the use of diverse sets of trajectories starting from different initial conformations to detect deviations between samples and sampling with enhanced methods such as accelerated MD (aMD) or scaled MD (sMD) that distort the energy landscape in controlled ways. To this end, we test the effects of these approaches on MD simulations of two flexible biomolecules in aqueous solution, Met-Enkephalin (5 amino acids) and HIV-1 gp120 V3 (a cycle of 35 amino acids). We assess the convergence of the sampling quantitatively with known, extensive measures of cluster number <i>N</i><sub>c</sub> and cluster distribution entropy <i>S</i><sub>c</sub> and with two new quantities, conformational overlap <i>O</i><sub>conf</sub> and density overlap <i>O</i><sub>dens</sub>, both conveniently ranging from 0 to 1. These new overlap measures quantify self-consistency of sampling in multitrajectory MD experiments, a necessary condition for converged sampling. A comprehensive assessment of sampling quality of MD experiments identifies the combination of diverse trajectory sets and aMD as the most efficient approach among those tested. However, analysis of <i>O</i><sub>dens</sub> between conventional and aMD trajectories also reveals that we have not completely corrected aMD sampling for the distorted energy landscape. Moreover, for V3, the courses of <i>N</i><sub>c</sub> and <i>O</i><sub>dens</sub> indicate that much higher resources than those generally invested today will probably be needed to achieve convergence. The comparative analysis also shows that conventional MD simulations with insufficient sampling can be easily misinterpreted as being converged

    Comparison of statistical indicators of association.

    No full text
    <p>200 random contingency tables with total count <i>N</i> = 100, a typical order of magnitude for analyses of sequence-feature association in practice, are analyzed by Fisher’s exact test, yielding <i>p</i> values for the rejection of independence (horizontal axis, not corrected for multiple testing), and by four different BF models, namely <i>K</i> = 1, <i>K</i> = 100, <i>K</i><sub><i>D</i></sub>, and uniform model, with corresponding BFs on vertical axis. Solid horizontal black line at <i>BF</i> = 1 and dashed vertical line at <i>p</i> = 0.05 for orientation.</p

    Odds-ratio plot and Tartan plot for visualization of statistical associations.

    No full text
    <p><b>A</b> Odds-ratio plot, based on an alignment of region of HIV-1 gp120 around the V3 loop (C296-C331). Here, the feature is the predicted co-receptor tropism of HIV-1 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0146409#pone.0146409.ref017" target="_blank">17</a>] (R5 vs. X4 tropic). Bar heights and colors indicate logarithms of odds ratios and negative logarithms of <i>p</i> values, respectively. A reference sequence and sequence positions can be added in the top and bottom rows for orientation. <b>B</b> Tartan plot for the synopsis of two alignment pair association measures, here: −log <i>p</i> from association test between alignment position pairs (upper right triangle) vs. Direct Information between these pairs (lower left triangle). Association strengths are color coded (color legend on the right). For orientation, axes can be annotated and sequence substructures can be indicated by lines.</p

    Comparison of frequentist approach and Bayes factors (BF).

    No full text
    <p>Discovery of association of alignment positions of HBV core proteins with patient HLA types, here: A*01 (top row) and B*44 (bottom row). Sequence numbers in panel titles are feature-carrying fractions of the total of 148 sequences included in the alignment. Association of sequences with feature HLA were analyzed by Fisher’s exact test (panels A, D), BF with <i>K</i> = 1 (panels B, E), and BF with <i>K</i><sub><i>D</i></sub> (panels C, F). Alignment positions with association above certain thresholds (horizontal dashed lines) are marked by red stars and vertical dashed lines, namely <i>p</i> < 0.01 (A, D), or <i>BF</i> > 10 (B, C, E, F). The <i>p</i> values and BFs shown are the best for each alignment position (lowest <i>p</i> values, highest <i>BF</i>s).</p

    Phylogenetic distribution of feature-carrying sequences and phylogenetic bias indicator <i>B</i>.

    No full text
    <p>The distance-based phylogenetic tree in all six panels was computed for the same set of 788 East Asian HIV-1 gag protein sequences obtained from the HIV sequence database at <a href="http://www.hiv.lanl.gov" target="_blank">http://www.hiv.lanl.gov</a>. In each panel, those branches are colored red that correspond to sequences that carry an amino acid substitution apparently associated with a certain HLA type. The numbers to the upper right of each tree are the corresponding values of the bias indicator <i>B</i>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0146409#pone.0146409.e005" target="_blank">Eq (4)</a>.</p

    Broken stick distribution (solid line) and NRADs of <i>IgG</i><sup>+</sup><i>CD</i>27<sup>+</sup> fractions (points).

    No full text
    <p>Inset: section of hierarchical clustering dendrogram where broken stick distribution appears. This plot adopts the usual presentation of the broken stick distribution in the literature with linear horizontal axis and logarithmic vertical axis. Therefore the boomerang shapes of the log-log <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.g004" target="_blank">Fig 4</a> appear horizontally stretched.</p

    Averaged NRADs of gut microbiome data in six age groups.

    No full text
    <p>The number of NRADs per group from youngest to oldest were 9, 18, 55, 64, 34, and 309, respectively. Solid lines are mean NRADs, shaded areas are 90% confidence intervals for the means.</p

    Robustness of NRADs against varying sampling depth.

    No full text
    <p>(A) original RAD of first sample of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.ref031" target="_blank">31</a>] (black) and down-sampled RAD (red). (B) the two NRADs obtained by MaxRank normalization to <i>R</i> = 1000 of the RADs in panel A are almost indistinguishable. (C) comparison of NRAD distances of the first 50 samples of the data set of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005362#pcbi.1005362.ref031" target="_blank">31</a>]. Left violin plot: density of distances between NRADs computed by MaxRank normalization to <i>R</i> = 1000 of the original RADs; middle violin plot: same for down-sampled RADs; right violin plot: distances between corresponding original and down-sampled NRADs. The biologically meaningful NRAD distance distributions are robust against differences in sample size (left and middle violin). In comparison, the distances related to differences in sample size are negligible (right violin).</p

    Diversity of the <i>V</i><sub><i>H</i></sub> region of BCRs.

    No full text
    <p>(A) The human genome contains sets of <i>V</i><sub><i>H</i></sub>, <i>D</i><sub><i>H</i></sub>, and <i>J</i><sub><i>H</i></sub> gene segments. (B) The “variable” <i>V</i><sub><i>H</i></sub> segments can be grouped into seven <i>V</i><sub><i>H</i></sub> families based on sequence similarity. (C) A genetically diverse pool of B cells is generated by V(D)J recombination. (D) Exposure to antigens induces an adaptation of the BCR repertoire, generating genetic variants and changing the usage pattern of <i>V</i><sub><i>H</i></sub> gene segments.</p

    General process employed in this work.

    No full text
    <p>Flowchart of procedure from original species/abundances or sequence/reads data (top box) to original RADs, then to NRADs, and analyses based on NRADs.</p
    corecore