8,594 research outputs found
Demographic Inference and Representative Population Estimates from Multilingual Social Media Data
Social media provide access to behavioural data at an unprecedented scale and
granularity. However, using these data to understand phenomena in a broader
population is difficult due to their non-representativeness and the bias of
statistical inference tools towards dominant languages and groups. While
demographic attribute inference could be used to mitigate such bias, current
techniques are almost entirely monolingual and fail to work in a global
environment. We address these challenges by combining multilingual demographic
inference with post-stratification to create a more representative population
sample. To learn demographic attributes, we create a new multimodal deep neural
architecture for joint classification of age, gender, and organization-status
of social media users that operates in 32 languages. This method substantially
outperforms current state of the art while also reducing algorithmic bias. To
correct for sampling biases, we propose fully interpretable multilevel
regression methods that estimate inclusion probabilities from inferred joint
population counts and ground-truth population counts. In a large experiment
over multilingual heterogeneous European regions, we show that our demographic
inference and bias correction together allow for more accurate estimates of
populations and make a significant step towards representative social sensing
in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web
Conference (WWW '19
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
ConStance: Modeling Annotation Contexts to Improve Stance Classification
Manual annotations are a prerequisite for many applications of machine
learning. However, weaknesses in the annotation process itself are easy to
overlook. In particular, scholars often choose what information to give to
annotators without examining these decisions empirically. For subjective tasks
such as sentiment analysis, sarcasm, and stance detection, such choices can
impact results. Here, for the task of political stance detection on Twitter, we
show that providing too little context can result in noisy and uncertain
annotations, whereas providing too strong a context may cause it to outweigh
other signals. To characterize and reduce these biases, we develop ConStance, a
general model for reasoning about annotations across information conditions.
Given conflicting labels produced by multiple annotators seeing the same
instances with different contexts, ConStance simultaneously estimates gold
standard labels and also learns a classifier for new instances. We show that
the classifier learned by ConStance outperforms a variety of baselines at
predicting political stance, while the model's interpretable parameters shed
light on the effects of each context.Comment: To appear at EMNLP 201
Emotions in context: examining pervasive affective sensing systems, applications, and analyses
Pervasive sensing has opened up new opportunities for measuring our feelings and understanding our behavior by monitoring our affective states while mobile. This review paper surveys pervasive affect sensing by examining and considering three major elements of affective pervasive systems, namely; âsensingâ, âanalysisâ, and âapplicationâ. Sensing investigates the different sensing modalities that are used in existing real-time affective applications, Analysis explores different approaches to emotion recognition and visualization based on different types of collected data, and Application investigates different leading areas of affective applications. For each of the three aspects, the paper includes an extensive survey of the literature and finally outlines some of challenges and future research opportunities of affective sensing in the context of pervasive computing
- âŠ