103 research outputs found

    Relative frequencies of “Figure” vs “figure” in both versions of the Google Books corpus for both English (all) and English Fiction.

    No full text
    <p>In the English data sets, the capitalized term rapidly surpasses the uncapitalized term in the 1960s. For the first English Fiction data set, this effect is delayed until the 1970s. As shown later, only the second version of the English Fiction data set demonstrates a filtering of scientific terminology. These trends strongly suggest an increase starting around 1900 in the sampling of scientific texts in both English data sets and the first English Fiction data set.</p

    (English, all; Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1950s and the 1980s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

    Time series of technical terms from Version 2: (a) English all, (b) English fiction.

    No full text
    <p>In the unfiltered data set, these technical terms appear frequently and increase in usage though the 1980s. In fiction, technical terms show up far less frequently and remain relatively stable in usage with the notable exception of “computer,” which has been gradually gaining popularity since the 1960s.</p

    JSD between 1880 and each displayed year for given data set, corresponding to dashed lines from Fig 4.

    No full text
    <p>Contributions are counted for all words appearing above a 10<sup>−5</sup> threshold in a given year; for the dashed curves, the threshold is 10<sup>−4</sup>. Typical behavior in each case consists of a relatively large jump between one year and the next with a more gradual rise afterward (in both directions). Exceptions include wartime, particularly the two World Wars, during which the divergence is greater than usual; however, after the conclusion of these periods, the cumulative divergence settles back to the previous trend. Initial spikiness in (D) is likely due to low volume.</p

    (Left) <i>k</i><sub>max,in</sub> and (Right) <i>k</i><sub>max,in</sub> for Twitter reply networks.

    No full text
    <p>Each data point represents the observed maximum in- and out-degree, averaged over 100 simulated subsampling experiments. The dashed line extrapolates the predicted number of edges for greater proportions of sampled data.</p

    (Left) Predicted <i>k<sub>avg,in</sub></i> and (Right) <i>k<sub>avg,out</sub></i> in Twitter reply networks.

    No full text
    <p>(Left) Predicted <i>k<sub>avg,in</sub></i> and (Right) <i>k<sub>avg,out</sub></i> in Twitter reply networks.</p

    The logarithms of the total 1-gram counts for the Google Books English data sets (dark gray) and English Fiction data sets (light gray).

    No full text
    <p>The dashed and solid curves denote the 2009 and 2012 versions of the data sets. In all four examples, an exponential increase in volume is apparent over time with notable exceptions during wartime when the total volume decreases, clearest during the American Civil War and both World Wars. While the total volume for English increases between versions, the volume for English fiction decreases drastically, suggesting a more rigorous filtering process.</p

    In, Out-degree vs. Average edge weight for Twitter reply networks.

    No full text
    <p>(Top, left) The average in-coming edge weight for each node of degree <i>k</i> is depicted in a logarithmically binned heatmap. (Top, right) The same as (a), except for out-going edges. (c.) The average weight per edge for in-coming edges as a function of <i>k</i><sub>in</sub> shows a gradual increase to <i>k</i><sub>in</sub>≈10<sup>2</sup> with a peak of approximately 2.2 interactions per edge. (d.) The average weight per edge for out-going edges as a function of <i>k</i><sub>out</sub> shows a gradual increase to <i>k</i><sub>out</sub>≈10<sup>2</sup> with a peak of between 2.5 and 3 interactions per edge.</p

    Failed link subnetwork.

    No full text
    <p>Hidden or missing links are depicted in grey. All nodes remain in the subnetwork and only visible or sampled links remain.</p

    For the ratio <i>r</i> between the smaller relative probability of an element and the average, <i>C</i>(<i>r</i>) is the proportion of the average contributed to the Jensen-Shannon divergence (see Eqs 6 and 7).

    No full text
    <p>In particular, if <i>r</i> = 1 (no change), then the contribution is zero; if <i>r</i> = 0, the contribution is half its probability in the distribution in which it occurs with nonzero probability.</p
    corecore