7 research outputs found

    Number of texts with <i>p</i>-value near zero (<i>p</i> < 0.01) in different ranges of <i>L</i> divided by the number of texts in the same ranges, for the fits of distributions <i>f</i><sub>1</sub> and <i>f</i><sub>2</sub>.

    No full text
    <p>Values of <i>L</i> denote the geometric mean of ranges containing 1000 texts each. The higher value for the fit of <i>f</i><sub>1</sub> (except for <i>L</i> below about 13000 tokens) denotes its worst performance.</p

    Estimated probability density of <i>β</i> for fits with <i>p</i> ≥ 0.05, in different length ranges.

    No full text
    <p>We have divided both groups of accepted texts into 4 percentiles according to <i>L</i>. As in the previous figure, the normal kernel smoothing method is applied. (a) For distribution <i>f</i><sub>1</sub>. (b) For distribution <i>f</i><sub>2</sub>.</p

    Histograms of <i>p</i>-values obtained when the Zipf-like distributions <i>f</i><sub>1</sub>, <i>f</i><sub>2</sub>, and <i>f</i><sub>3</sub> are fitted to the texts of the English Project Gutenberg.

    No full text
    <p>The histograms just count the number of texts in each bin of width 0.01. Note the poor performance of distribution 3 and the best performance of 2. Power-law approximations to the histograms for <i>f</i><sub>1</sub> and <i>f</i><sub>2</sub>, with respective exponents 0.74 and 0.78, are shown as a guide to the eye.</p

    Estimated probability density functions of <i>p</i>-values conditioned to <i>p</i> ≥ 0.01 separating for different ranges of text length <i>L</i>. <i>p</i>-values correspond to the fitting of word frequencies to (a) distribution <i>f</i><sub>1</sub> and (b) distribution <i>f</i><sub>2</sub>.

    No full text
    <p>We divide the distribution of text length into 15 intervals of 2 000 texts each. For distribution <i>f</i><sub>1</sub> only the first seven groups (up to length 34 400) are displayed (beyond this value we do not have enough statistics to see the distribution of <i>p</i>-values greater than 0.01, as displayed in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147073#pone.0147073.g006" target="_blank">Fig 6</a>; for distribution 2 this happens only in the last two groups). The intervals <i>L</i><sub><i>i</i></sub> range from <i>L</i><sub>1</sub> = [115, 5291] to <i>L</i><sub>6</sub> = [25739, 34378] and to <i>L</i><sub>13</sub> = [89476, 103767].</p

    Complementary cumulative distributions (i.e., survival functions) of <i>p</i>-values obtained when our three distributions are fitted to the texts of the English Project Gutenberg.

    No full text
    <p>This corresponds, except for normalization, to the integral of the previous figure, but we have included a fourth curve for the fraction of texts whose <i>p</i>-values for fits 1 and 2 are both higher than the value marked in the abscissa. Note that the values of <i>p</i> can play the role of the significance level. The value for <i>p</i> = 0 is not shown, in order to have higher resolution.</p

    Complementary cumulative distribution and probability mass function of text frequencies, for: (a) <i>A Chronicle of London, from 1089 to 1483</i> (anonymous); (b) <i>The Works of Charles and Mary Lamb</i>, Vol. V, edited by E. V. Lucas; (c) <i>A Popular History of France from the Earliest Times</i>, Vol. I, by F. Guizot.

    No full text
    <p>These texts are the ones with the largest length <i>L</i> (83 720, 239 018 and 2 081 respectively) of those that fulfill <i>p</i> > 1/2, for fits 1, 2 and 3 respectively. The exponent <i>β</i> takes values 1.96, 1.89, and 1.82, in each case.</p
    corecore