14 research outputs found

    Normalized frequencies of references to years.

    No full text
    <p>The top panel resembles a figure from [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0137041#pone.0137041.ref001" target="_blank">1</a>] using unfiltered data from English Version 2. (The cited paper uses Version 1.) Note the characteristic rapid rises and gradual declines, as well as the increasing peaks in yearly references. However, while the characteristic shape is still present in fiction (Version 2, bottom)ā€”at much reduced levelsā€”the peaks do not rise. The rising effect is likely due to citations from scientific texts.</p

    Relative frequencies of ā€œFigureā€ vs ā€œfigureā€ in both versions of the Google Books corpus for both English (all) and English Fiction.

    No full text
    <p>In the English data sets, the capitalized term rapidly surpasses the uncapitalized term in the 1960s. For the first English Fiction data set, this effect is delayed until the 1970s. As shown later, only the second version of the English Fiction data set demonstrates a filtering of scientific terminology. These trends strongly suggest an increase starting around 1900 in the sampling of scientific texts in both English data sets and the first English Fiction data set.</p

    JSD between 1880 and each displayed year for given data set, corresponding to dashed lines from Fig 4.

    No full text
    <p>Contributions are counted for all words appearing above a 10<sup>āˆ’5</sup> threshold in a given year; for the dashed curves, the threshold is 10<sup>āˆ’4</sup>. Typical behavior in each case consists of a relatively large jump between one year and the next with a more gradual rise afterward (in both directions). Exceptions include wartime, particularly the two World Wars, during which the divergence is greater than usual; however, after the conclusion of these periods, the cumulative divergence settles back to the previous trend. Initial spikiness in (D) is likely due to low volume.</p

    For the ratio <i>r</i> between the smaller relative probability of an element and the average, <i>C</i>(<i>r</i>) is the proportion of the average contributed to the Jensen-Shannon divergence (see Eqs 6 and 7).

    No full text
    <p>In particular, if <i>r</i> = 1 (no change), then the contribution is zero; if <i>r</i> = 0, the contribution is half its probability in the distribution in which it occurs with nonzero probability.</p

    Heatmaps showing the JSD between every pair of years between 1800 and 2000, contributed by words appearing above a normalized frequency threshold of 10<sup>āˆ’5</sup>.

    No full text
    <p>The dashed lines highlight the divergences to and from the year 1880, which are featured in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0137041#pone.0137041.g005" target="_blank">Fig 5</a>. The off-diagonal elements represent divergences between consecutive years, as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0137041#pone.0137041.g006" target="_blank">Fig 6</a>. The color represents the percentage of the maximum divergence observed in the given time range for each data set. The divergence between a year and itself is zero. For any given year, the divergence increases with the distance (number of years) from the diagonalā€”sharply at first, then gradually. Interesting features of the maps are the presence of two cross-hairs in the first half of the 20th century, which strongly suggests a wartime shift in the language, as well as an asymmetry that suggests a particularly high divergence between the first half century and the last quarter century observed.</p

    The logarithms of the total 1-gram counts for the Google Books English data sets (dark gray) and English Fiction data sets (light gray).

    No full text
    <p>The dashed and solid curves denote the 2009 and 2012 versions of the data sets. In all four examples, an exponential increase in volume is apparent over time with notable exceptions during wartime when the total volume decreases, clearest during the American Civil War and both World Wars. While the total volume for English increases between versions, the volume for English fiction decreases drastically, suggesting a more rigorous filtering process.</p

    (English Fiction, Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1950s and the 1980s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades (see title). All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

    (English, all; Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1930s and the 1940s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

    (English Fiction, Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1930s and the 1940s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

    (English Fiction, Version 1.) Top 60 individual contributions of 1-grams to the JSD between the 1930s and the 1940s.

    No full text
    <p>Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p
    corecore