6,366 research outputs found

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    Wearing Many (Social) Hats: How Different are Your Different Social Network Personae?

    Full text link
    This paper investigates when users create profiles in different social networks, whether they are redundant expressions of the same persona, or they are adapted to each platform. Using the personal webpages of 116,998 users on About.me, we identify and extract matched user profiles on several major social networks including Facebook, Twitter, LinkedIn, and Instagram. We find evidence for distinct site-specific norms, such as differences in the language used in the text of the profile self-description, and the kind of picture used as profile image. By learning a model that robustly identifies the platform given a user's profile image (0.657--0.829 AUC) or self-description (0.608--0.847 AUC), we confirm that users do adapt their behaviour to individual platforms in an identifiable and learnable manner. However, different genders and age groups adapt their behaviour differently from each other, and these differences are, in general, consistent across different platforms. We show that differences in social profile construction correspond to differences in how formal or informal the platform is.Comment: Accepted at the 11th International AAAI Conference on Web and Social Media (ICWSM17

    An analysis of the user occupational class through Twitter content

    Get PDF
    Social media content can be used as a complementary source to the traditional methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a userā€™s occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications
    • ā€¦
    corecore