13 research outputs found

    Clusters Beat Trend!? Testing Feature Hierarchy in Statistical Graphics

    No full text
    <p>Graphics are very effective for communicating numerical information quickly and efficiently, but many of the design choices we make are based on subjective measures, such as personal taste or conventions of the discipline rather than objective criteria. We briefly introduce perceptual principles such as preattentive features and gestalt heuristics, and then discuss the design and results of a factorial experiment examining the effect of plot aesthetics such as color and trend lines on participants’ assessment of ambiguous data displays. The quantitative and qualitative experimental results strongly suggest that plot aesthetics have a significant impact on the perception of important features in data displays. Supplementary materials for this article are available online.</p

    Are you Normal? The Problem of Confounded Residual Structures in Hierarchical Linear Models

    No full text
    <div><p>We encounter hierarchical data structures in a wide range of applications. Regular linear models are extended by random effects to address correlation between observations in the same group. Inference for random effects is sensitive to distributional mis-specifications of the model, making checks for (distributional) assumptions particularly important. The investigation of residual structures is complicated by the presence of different levels and corresponding dependencies. Ignoring these dependencies leads to erroneous conclusions using our familiar tools, such as Q-Q plots or normal tests. We first show the extent of the problem, then we introduce the <i>fraction of confounding</i> as a measure of the level of confounding in a model and finally introduce rotated random effects as a solution to assessing distributional model assumptions. This article has supplementary materials online.</p></div

    Letter-Value Plots: Boxplots for Large Data

    No full text
    <p>Boxplots are useful displays that convey rough information about the distribution of a variable. Boxplots were designed to be drawn by hand and work best for small datasets, where detailed estimates of tail behavior beyond the quartiles may not be trustworthy. Larger datasets afford more precise estimates of tail behavior, but boxplots do not take advantage of this precision, instead presenting large numbers of extreme, though not unexpected, observations. Letter-value plots address this problem by including more detailed information about the tails using “letter values,” an order statistic defined by Tukey. Boxplots display the first two letter values (the median and quartiles); letter-value plots display further letter values so far as they are reliable estimates of their corresponding quantiles. We illustrate letter-value plots with real data that demonstrate their usefulness for large datasets. All graphics are created using the R package lvplot, and code and data are available in the supplementary materials.</p

    Biomathematical Description of Synthetic Peptide Libraries

    Get PDF
    <div><p>Libraries of randomised peptides displayed on phages or viral particles are essential tools in a wide spectrum of applications. However, there is only limited understanding of a library's fundamental dynamics and the influences of encoding schemes and sizes on their quality. Numeric properties of libraries, such as the expected number of different peptides and the library's coverage, have long been in use as measures of a library's quality. Here, we present a graphical framework of these measures together with a library's relative efficiency to help to describe libraries in enough detail for researchers to plan new experiments in a more informed manner. In particular, these values allow us to answer-in a probabilistic fashion-the question of whether a specific library does indeed contain one of the "best" possible peptides. The framework is implemented in a web-interface based on two packages, discreteRV and peptider, to the statistical software environment R. We further provide a user-friendly web-interface called PeLiCa (<i>Pe</i>ptide <i>Li</i>brary <i>Ca</i>lculator, <a href="http://www.pelica.org" target="_blank">http://www.pelica.org</a>), allowing scientists to plan and analyse their peptide libraries.</p></div

    Overview of relative efficiency for k-peptide libraries (6 to 10) of sizes N from 10<sup>6</sup> to 10<sup>15</sup>.

    No full text
    <p>Relative efficiency decreases with an increased number of oligonucleotides in the library and longer peptide sequencesdue to the larger initial loss.</p

    Overview of the inclusion probabilities for peptide sequences of lengths 6 to 10 (in rows) in libraries of sizes between 10<sup>8</sup> to 10<sup>12</sup> (in columns) for different encoding schemes (as side-by-side boxplots).

    No full text
    <p>The boxes contain the middle 50 percent of inclusion probabilities for all peptide sequences of length k in each of the schemes. The vertical lines extend to minimum and maximum of the inclusion probabilities. 20/20-C libraries do not have any variability in the inclusion probabilities, because all sequences are equally likely. NNN-C libraries generally show the largest variability (as seen in the extent of the boxes) in probabilities, followed by NNB-C and NNK/S-C. Simultaneously, median inclusion probabilities increase from NNN-C to 20/20-C libraries for all combinations of peptide lengths and library sizes.</p

    Measuring Lineup Difficulty By Matching Distance Metrics With Subject Choices in Crowd-Sourced Data

    No full text
    <p>Graphics play a crucial role in statistical analysis and data mining. Being able to quantify structure in data that is visible in plots, and how people read the structure from plots is an ongoing challenge. The lineup protocol provides a formal framework for data plots, making inference possible. The data plot is treated like a test statistic, and lineup protocol acts like a comparison with the sampling distribution of the nulls. This article describes metrics for describing structure in data plots and evaluates them in relation to the choices that human readers made during several large Amazon Turk studies using lineups. The metrics that were more specific to the plot types tended to better match subject choices, than generic metrics. The process that we followed to evaluate metrics will be useful for general development of numerically measuring structure in plots, and also in future experiments on lineups for choosing blocks of pictures. Supplementary materials for this article are available online.</p

    All NNK/S-C peptide sequences of length two partitioned according to peptide classes.

    No full text
    <p>All NNK/S-C peptide sequences of length two partitioned according to peptide classes.</p

    Difference in expected coverage between NNB and NNK/S libraries (with cysteines).

    No full text
    <p>Initially, NNB libraries have a slight advantage in expected coverage over NNK/S libraries. Once a coverage of about 50% is reached, this pattern reverses and NNK/S libraries have a highere expected coverage. For very large libraries the difference in coverage is again, approaching zero (when libraries under both schemes have a coverage of almost 100%).</p
    corecore