Search CORE

103 research outputs found

Relative frequencies of “Figure” vs “figure” in both versions of the Google Books corpus for both English (all) and English Fiction.

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

In the English data sets, the capitalized term rapidly surpasses the uncapitalized term in the 1960s. For the first English Fiction data set, this effect is delayed until the 1970s. As shown later, only the second version of the English Fiction data set demonstrates a filtering of scientific terminology. These trends strongly suggest an increase starting around 1900 in the sampling of scientific texts in both English data sets and the first English Fiction data set.</p

FigShare

(English, all; Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1950s and the 1980s.

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

Each contribution is given as a percentage of the total JSD (see horizontal axis label) between the two given decades. All contributions are positive; bars to the left of center represent words that were more common in the earlier decade, whereas bars to the right represent words that became more common in the later decade.</p

FigShare

Time series of technical terms from Version 2: (a) English all, (b) English fiction.

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

In the unfiltered data set, these technical terms appear frequently and increase in usage though the 1980s. In fiction, technical terms show up far less frequently and remain relatively stable in usage with the notable exception of “computer,” which has been gradually gaining popularity since the 1960s.</p

FigShare

JSD between 1880 and each displayed year for given data set, corresponding to dashed lines from Fig 4.

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

Contributions are counted for all words appearing above a 10−5 threshold in a given year; for the dashed curves, the threshold is 10−4. Typical behavior in each case consists of a relatively large jump between one year and the next with a more gradual rise afterward (in both directions). Exceptions include wartime, particularly the two World Wars, during which the divergence is greater than usual; however, after the conclusion of these periods, the cumulative divergence settles back to the previous trend. Initial spikiness in (D) is likely due to low volume.</p

FigShare

(Left) kmax,in and (Right) kmax,in for Twitter reply networks.

Author: Catherine A. Bliss (188890)
Christopher M. Danforth (185505)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

Each data point represents the observed maximum in- and out-degree, averaged over 100 simulated subsampling experiments. The dashed line extrapolates the predicted number of edges for greater proportions of sampled data.</p

FigShare

(Left) Predicted kavg,in and (Right) kavg,out in Twitter reply networks.

Author: Catherine A. Bliss (188890)
Christopher M. Danforth (185505)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

(Left) Predicted kavg,in and (Right) kavg,out in Twitter reply networks.</p

FigShare

The logarithms of the total 1-gram counts for the Google Books English data sets (dark gray) and English Fiction data sets (light gray).

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

The dashed and solid curves denote the 2009 and 2012 versions of the data sets. In all four examples, an exponential increase in volume is apparent over time with notable exceptions during wartime when the total volume decreases, clearest during the American Civil War and both World Wars. While the total volume for English increases between versions, the volume for English fiction decreases drastically, suggesting a more rigorous filtering process.</p

FigShare

Predicted edge weight and degree distributions for Twitter reply networks.

Author: Catherine A. Bliss (188890)
Christopher M. Danforth (185505)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

(Top) The predicted edge weight distribution. (Bottom, left) Predicted Pr(kin) and (Bottom, right) Pr(kout) for Twitter reply networks.</p

FigShare

Subnetwork generated from sampled links.

Author: Catherine A. Bliss (188890)
Christopher M. Danforth (185505)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

(Left) A network is sampled by randomly selecting links shown in red. (Right) The subnetwork consists of all sampled links and only nodes which are incident with the sampled links. In this type of sampling, no nodes of degree zero are included in the network. Large degree nodes are more likely to be included in the subnetwork.</p

FigShare

For the ratio r between the smaller relative probability of an element and the average, C(r) is the proportion of the average contributed to the Jensen-Shannon divergence (see Eqs 6 and 7).

Author: Christopher M. Danforth (185505)
Eitan Adam Pechenick (804146)
Peter Sheridan Dodds (188891)
Publication venue
Publication date
Field of study

In particular, if r = 1 (no change), then the contribution is zero; if r = 0, the contribution is half its probability in the distribution in which it occurs with nonzero probability.</p

FigShare

Relative frequencies of “Figure” vs “figure” in both versions of the Google Books corpus for both English (all) and English Fiction.

(English, all; Version 2.) Top 60 individual contributions of 1-grams to the JSD between the 1950s and the 1980s.

Time series of technical terms from Version 2: (a) English all, (b) English fiction.

JSD between 1880 and each displayed year for given data set, corresponding to dashed lines from Fig 4.

(Left) <i>k</i><sub>max,in</sub> and (Right) <i>k</i><sub>max,in</sub> for Twitter reply networks.

(Left) Predicted <i>k<sub>avg,in</sub></i> and (Right) <i>k<sub>avg,out</sub></i> in Twitter reply networks.

The logarithms of the total 1-gram counts for the Google Books English data sets (dark gray) and English Fiction data sets (light gray).

Predicted edge weight and degree distributions for Twitter reply networks.

Subnetwork generated from sampled links.

For the ratio <i>r</i> between the smaller relative probability of an element and the average, <i>C</i>(<i>r</i>) is the proportion of the average contributed to the Jensen-Shannon divergence (see Eqs 6 and 7).