17 research outputs found

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    FU19 Nephrops Grounds 2023 UWTV Survey Report and catch scenarios for 2024

    Get PDF
    This report provides the main results of the fourteenth underwater television survey of the various Nephrops patches in Functional Unit 19. The survey was multi disciplinary in nature collecting UWTV and other ecosystem data. In 2023 a total 42 UWTV stations were successfully completed. The mean density estimates varied considerably across the different patches. The 2023 raised abundance estimate showed a 15% decrease from the 2022 estimate and at 220 million burrows is below the MSY Btrigger reference point (430 million). Using the 2023 estimate of abundance and updated stock data implies catch in 2024 that correspond to the F ranges in the EU multi annual plan for Western Waters are between 224 and 248 tonnes (assuming that discard rates and fishery selection patterns do not change from the average of 2020–2022). One species of sea pen was observed; Virgularia mirabilis which has been observed on previous surveys of FU19. Trawl marks were observed at 10% of the stations surveyed.Marine Institut

    An Analysis of Exercising Behavior in Online Populations

    No full text
    Exercise plays a central role in many peoples' fitness goals.  While prior work has examined how individuals pursue these health and fitness goals on general purpose platforms such as Twitter, the lack of precise activity recording has limited detailed analyses of individual and group behaviors.  In this study, we explore a recent social media platform dedicated to exercise and use nearly four years of longitudinal exercising history of over 188,000 users to discover large-scale exercising patterns corresponding to different motivations such as sports or general fitness

    Organizations Are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter

    No full text
    Much work on the demographics of social media platforms such as Twitter has focused on the properties of individuals, such as gender or age. However, because credible detectors for organization accounts do not exist, these and future large-scale studies of human behavior on social media can be contaminated by the presence of accounts belonging to organizations. We analyze organizations on Twitter to assess their distinct behavioral characteristics and determine what types of organizations are active. We first create a dataset of manually classified accounts from a representative sample of Twitter and then introduce a classifier to distinguish between organizational and personal accounts. In addition, we find that although organizations make up less than 10% of the accounts, they are significantly more connected, with an order of magnitude more friends and followers

    Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice

    No full text
    Geolocated social media data provides a powerful source of information about place and regional human behavior.  Because little social media data is geolocation-annotated, inference techniques serve an essential role for increasing the volume of annotated data. One major class of inference approaches has relied on the social network of Twitter, where the locations of a user's friends serve as evidence for that user's location. While many such inference techniques have been recently proposed, we actually know little about their relative performance, with the amount of ground truth data varying between 5% and 100% of the network, the size of the social network varying by four orders of magnitude, and little standardization in evaluation metrics. We conduct a systematic comparative analysis of nine state-of-the-art network-based methods for performing geolocation inference at the global scale, controlling for the source of ground truth data, dataset size, and temporal recency in test data. Furthermore, we identify a comprehensive set of evaluation metrics that clarify performance differences. Our analysis identifies a large performance disparity between that reported in the literature and that seen in real-world conditions. To aid reproducibility and future comparison, all implementations have been released in an open source geoinference package
    corecore