46,732 research outputs found

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Get PDF
    Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

    Does social media usage matter? An analysis of online practices and digital media perceptions of communication practitioners in Europe

    Get PDF
    A key aspect for understanding and explaining online communication is the micro level of communication practitioners’ social media usage and their general attitudes towards digital platforms. This paper investigates how public relations practitioner's personal and professional use of social media is related to their perceptions of social media. A quantitative methodology was applied to perform this research. A population of 2710 professionals from 43 European countries working on different hierarchical levels both in communication departments and agencies across Europe were surveyed as part of a larger transnational online survey. Results show that practitioners with a high level of usage of social media give more importance to social media channels, influence of social media on internal and external stakeholders and relevance of key gatekeepers and stakeholders along with a better self-estimation of competences. Issues about diverse levels of overestimation of social media use, application and importance in the professional arena are also debated

    When is it Biased? Assessing the Representativeness of Twitter's Streaming API

    Full text link
    Twitter has captured the interest of the scientific community not only for its massive user base and content, but also for its openness in sharing its data. Twitter shares a free 1% sample of its tweets through the "Streaming API", a service that returns a sample of tweets according to a set of parameters set by the researcher. Recently, research has pointed to evidence of bias in the data returned through the Streaming API, raising concern in the integrity of this data service for use in research scenarios. While these results are important, the methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. In this work we tackle the problem of finding sample bias without the need for "gold standard" Firehose data. Namely, we focus on finding time periods in the Streaming API data where the trend of a hashtag is significantly different from its trend in the true activity on Twitter. We propose a solution that focuses on using an open data source to find bias in the Streaming API. Finally, we assess the utility of the data source in sparse data situations and for users issuing the same query from different regions
    • 

    corecore