19 research outputs found

    Locations of <i>Flickr</i> photographs labelled with “protest” in 2013.

    No full text
    <p>We investigate to what extent data on the number of photographs tagged with the word “protest” and uploaded to <i>Flickr</i> reflect the ground truth data extracted from <i>The Guardian</i>. For each of the 244 countries and regions for each month in 2013, we calculate the total number of geotagged photographs taken and uploaded to <i>Flickr</i>. Here, we visualise the percentage of photographs for each country and region and each month which were also labelled with the character sequence “protest”. Visual inspection suggests that while there are clear differences between the spatio-temporal distributions of “protest” labelled <i>Flickr</i> photographs and “protest” labelled articles in <i>The Guardian</i>, some key similarities can also be identified, such as an increase in “protest” labelled <i>Flickr</i> photographs in Brazil and Turkey in June 2013. Equal breaks are calculated for the logarithmically transformed percentages.</p

    How do <i>Google</i> queries vary with birth rate?

    No full text
    <p>(<b>A</b>) The number of births for 1,000 people in each US state. Birth rate is defined as the number of births for 1,000 people. (<b>B</b>) We use <i>Google Correlate</i> to find terms for which the number of searches is higher in U.S. states with higher birth rates. Similarly, we identify terms for which the number of searches is higher in states with lower birth rates. Here, we list the 31 terms which showed the strongest positive correlation (left) and negative correlation (right) with state wide birth rate. To determine the significance of these correlations, we generate 1,000 random samples from a multivariate Gaussian distribution where states which are closer together tend to have a similar value. We submit these samples to <i>Google Correlate</i> and build a distribution of correlation coefficients for each of the 31 top most search terms. We depict the strength of correlation required for the correlation to be significant at the <i>p</i> < 0.05 and <i>p</i> < 0.01 level, given this null hypothesis distribution. (<b>C</b>) To allow us to generalise beyond individual search terms, we conduct an online survey asking participants to identify the main topic in each list of 31 terms. Here, we depict all survey responses which account for more than 5% of submitted responses. Our results suggest that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats (“baby car seat”, <i>p</i> = 0.051, all remaining <i>p</i>s <0.05).</p

    Datasets for Quantifying Crowd Size for Mobile Phone and Twitter Data

    No full text
    Datasets for Quantifying Crowd Size for Mobile Phone and Twitter Dat

    Data for Using Deep Learning to Quantify the Beauty of Outdoor Places

    No full text
    This database lists all the Geograph images we used from Scenic-Or-Not (http://scenicornot.datasciencelab.co.uk/) to help us understand what beautiful outdoor spaces are composed of. We only include images in our analysis that have been rated more than three times

    Reports of protests in 2013 in the online edition of <i>The Guardian</i>.

    No full text
    <p>We use data on reports of protests in the online edition of <i>The Guardian</i> as an approximation of the ground truth of when and where notable protest outbreaks occurred. For each of the 244 countries and regions for each month in 2013, we calculate the number of <i>The Guardian</i> articles tagged with the country and region’s name. Here, we depict the percentage of articles for each country and region and each month which were also tagged with the word “protest”. Patterns which can be visually identified in the data reflect known major protest events in 2013: for example, protest outbreaks in both Brazil and Turkey can be observed in June 2013. Equal breaks are calculated for the logarithmically transformed percentages.</p

    Data for Using Deep Learning to Quantify the Beauty of Outdoor Places

    No full text
    In order to predict the scenic ratings of images for which we do not already have crowdsourced data, we use a transfer learning approach to leverage the knowledge of the Places365 CNN [1], which can predict the place category of a scene with a high degree of accuracy. We modify the Places CNN to instead predict the scenicness of an image. We fine-tune our CNN using 80% of the Scenic-Or-Not dataset [2], and use the remaining 20% test set to check our prediction accuracy. We calculate a performance measure using the Kendall Rank correlation between the predicted scenic scores and the actual scenic scores. The Scenic CNN trained using the Visual Geometry Group (VGG) convolutional neural network architecture delivers the best performance with an overall prediction accuracy of 0.658. We predict the scenicness of images of London uploaded to Geograph (http://www.geograph.org.uk/). This dataset includes all the scenic predictions used to create Figure6 "Predictions of scenic ratings for London images". [1] Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., & Oliva, A. 2016 Places: An image database for deep scene understanding. arXiv preprint arXiv:1610.02055

    How do <i>Google</i> queries vary with infant mortality rate?

    No full text
    <p>(<b>A</b>) Infant mortality rates for each state in the US. An infant is defined as any person one year old or younger. Infant mortality rate is defined as the number of infant deaths per 1,000 births. (<b>B</b>) In a similar fashion to our investigation of birth rates (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0149025#pone.0149025.g001" target="_blank">Fig 1</a>), we use <i>Google Correlate</i> to find terms for which the number of searches is higher in U.S. states with higher infant mortality rates, and with lower infant mortality rates. We list the 31 terms for which differences in search volume across U.S. states shows the strongest positive correlation (left) and negative correlation (right) with state wide infant mortality rate. Again, we generate 1,000 random samples from a multivariate Gaussian distribution where states which are closer together tend to have a similar value. We submit these samples to <i>Google Correlate</i> and build a distribution of correlation coefficients for each of the 31 top most search terms. We depict the strength of correlation required for the correlation to be significant at the <i>p</i> < 0.05 and <i>p</i> < 0.01 level, given this null hypothesis distribution. (<b>C</b>) Again, we ask <i>Amazon Mechanical Turk</i> users to identify the most prominent topic in each of these lists of terms. We depict all survey responses which account for more than 5% of submitted responses, along with the percentage and number of respondents who gave each response. Our results suggest that users in states with higher infant mortality rates search for more information about credit and loans, as well as sexually transmitted diseases (all search terms <i>p</i> < 0.05).</p

    Empirical distribution of normalised returns for <i>American Express</i>.

    No full text
    <p>We build returns distributions for the 25 stocks of the DJIA for different time lags across the full period of analysis. We standardize each distribution by subtracting the mean return from each observation and dividing by the standard deviation. We depict in blue the cumulative distribution function of the positive component of the return distributions for <i>American Express</i> for a time lag of 300 seconds. We depict in red the positive tail of a Gaussian distribution with mean zero and standard deviation one. We observe a strong deviation of the empirical distribution from the Gaussian distribution. Instead, visual inspection of the distribution tail reveals consistency with a linear relationship on a log-log scale. This provides initial evidence for possible power law behavior at this time scale.</p

    Components of the DJIA.

    No full text
    <p>Here we depict the components of the DJIA in the time period between 02 January 2008 to 30 July 2010. Dashed vertical lines correspond to changes in the stocks forming the DJIA. In our analysis, we focus on the 25 stocks that were part of the DJIA during the period of analysis. Stocks are labelled using ticker symbols that uniquely identify the company name, as used by the stock exchange.</p
    corecore