46,732 research outputs found
Demographic Inference and Representative Population Estimates from Multilingual Social Media Data
Social media provide access to behavioural data at an unprecedented scale and
granularity. However, using these data to understand phenomena in a broader
population is difficult due to their non-representativeness and the bias of
statistical inference tools towards dominant languages and groups. While
demographic attribute inference could be used to mitigate such bias, current
techniques are almost entirely monolingual and fail to work in a global
environment. We address these challenges by combining multilingual demographic
inference with post-stratification to create a more representative population
sample. To learn demographic attributes, we create a new multimodal deep neural
architecture for joint classification of age, gender, and organization-status
of social media users that operates in 32 languages. This method substantially
outperforms current state of the art while also reducing algorithmic bias. To
correct for sampling biases, we propose fully interpretable multilevel
regression methods that estimate inclusion probabilities from inferred joint
population counts and ground-truth population counts. In a large experiment
over multilingual heterogeneous European regions, we show that our demographic
inference and bias correction together allow for more accurate estimates of
populations and make a significant step towards representative social sensing
in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web
Conference (WWW '19
Does social media usage matter? An analysis of online practices and digital media perceptions of communication practitioners in Europe
A key aspect for understanding and explaining online communication is the micro level of communication practitionersâ social media usage and their general attitudes towards digital platforms. This paper investigates how public relations practitioner's personal and professional use of social media is related to their perceptions of social media. A quantitative methodology was applied to perform this research. A population of 2710 professionals from 43 European countries working on different hierarchical levels both in communication departments and agencies across Europe were surveyed as part of a larger transnational online survey. Results show that practitioners with a high level of usage of social media give more importance to social media channels, influence of social media on internal and external stakeholders and relevance of key gatekeepers and stakeholders along with a better self-estimation of competences. Issues about diverse levels of overestimation of social media use, application and importance in the professional arena are also debated
When is it Biased? Assessing the Representativeness of Twitter's Streaming API
Twitter has captured the interest of the scientific community not only for
its massive user base and content, but also for its openness in sharing its
data. Twitter shares a free 1% sample of its tweets through the "Streaming
API", a service that returns a sample of tweets according to a set of
parameters set by the researcher. Recently, research has pointed to evidence of
bias in the data returned through the Streaming API, raising concern in the
integrity of this data service for use in research scenarios. While these
results are important, the methodologies proposed in previous work rely on the
restrictive and expensive Firehose to find the bias in the Streaming API data.
In this work we tackle the problem of finding sample bias without the need for
"gold standard" Firehose data. Namely, we focus on finding time periods in the
Streaming API data where the trend of a hashtag is significantly different from
its trend in the true activity on Twitter. We propose a solution that focuses
on using an open data source to find bias in the Streaming API. Finally, we
assess the utility of the data source in sparse data situations and for users
issuing the same query from different regions
- âŠ