103 research outputs found

    Relative frequencies of “Figure” vs “figure” in both versions of the Google Books corpus for both English (all) and English Fiction.

    No full text
    <p>In the English data sets, the capitalized term rapidly surpasses the uncapitalized term in the 1960s. For the first English Fiction data set, this effect is delayed until the 1970s. As shown later, only the second version of the English Fiction data set demonstrates a filtering of scientific terminology. These trends strongly suggest an increase starting around 1900 in the sampling of scientific texts in both English data sets and the first English Fiction data set.</p

    (English, all; Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1950s and the 1980s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

    Time series of technical terms from Version 2: (a) English all, (b) English fiction.

    No full text
    <p>In the unfiltered data set, these technical terms appear frequently and increase in usage though the 1980s. In fiction, technical terms show up far less frequently and remain relatively stable in usage with the notable exception of “computer,” which has been gradually gaining popularity since the 1960s.</p

    JSD between 1880 and each displayed year for given data set, corresponding to dashed lines from Fig 4.

    No full text
    <p>Contributions are counted for all words appearing above a 10<sup>−5</sup> threshold in a given year; for the dashed curves, the threshold is 10<sup>−4</sup>. Typical behavior in each case consists of a relatively large jump between one year and the next with a more gradual rise afterward (in both directions). Exceptions include wartime, particularly the two World Wars, during which the divergence is greater than usual; however, after the conclusion of these periods, the cumulative divergence settles back to the previous trend. Initial spikiness in (D) is likely due to low volume.</p

    (Left) <i>k</i><sub>max,in</sub> and (Right) <i>k</i><sub>max,in</sub> for Twitter reply networks.

    No full text
    <p>Each data point represents the observed maximum in- and out-degree, averaged over 100 simulated subsampling experiments. The dashed line extrapolates the predicted number of edges for greater proportions of sampled data.</p

    (Left) Predicted <i>k<sub>avg,in</sub></i> and (Right) <i>k<sub>avg,out</sub></i> in Twitter reply networks.

    No full text
    <p>(Left) Predicted <i>k<sub>avg,in</sub></i> and (Right) <i>k<sub>avg,out</sub></i> in Twitter reply networks.</p

    The logarithms of the total 1-gram counts for the Google Books English data sets (dark gray) and English Fiction data sets (light gray).

    No full text
    <p>The dashed and solid curves denote the 2009 and 2012 versions of the data sets. In all four examples, an exponential increase in volume is apparent over time with notable exceptions during wartime when the total volume decreases, clearest during the American Civil War and both World Wars. While the total volume for English increases between versions, the volume for English fiction decreases drastically, suggesting a more rigorous filtering process.</p

    Predicted edge weight and degree distributions for Twitter reply networks.

    No full text
    <p>(Top) The predicted edge weight distribution. (Bottom, left) Predicted <i>Pr</i>(<i>k</i><sub>in</sub>) and (Bottom, right) <i>Pr</i>(<i>k</i><sub>out</sub>) for Twitter reply networks.</p

    Subnetwork generated from sampled links.

    No full text
    <p>(Left) A network is sampled by randomly selecting links shown in red. (Right) The subnetwork consists of all sampled links and only nodes which are incident with the sampled links. In this type of sampling, no nodes of degree zero are included in the network. Large degree nodes are more likely to be included in the subnetwork.</p

    For the ratio <i>r</i> between the smaller relative probability of an element and the average, <i>C</i>(<i>r</i>) is the proportion of the average contributed to the Jensen-Shannon divergence (see Eqs 6 and 7).

    No full text
    <p>In particular, if <i>r</i> = 1 (no change), then the contribution is zero; if <i>r</i> = 0, the contribution is half its probability in the distribution in which it occurs with nonzero probability.</p
    corecore