10,029 research outputs found
White, Man, and Highly Followed: Gender and Race Inequalities in Twitter
Social media is considered a democratic space in which people connect and
interact with each other regardless of their gender, race, or any other
demographic factor. Despite numerous efforts that explore demographic factors
in social media, it is still unclear whether social media perpetuates old
inequalities from the offline world. In this paper, we attempt to identify
gender and race of Twitter users located in U.S. using advanced image
processing algorithms from Face++. Then, we investigate how different
demographic groups (i.e. male/female, Asian/Black/White) connect with other. We
quantify to what extent one group follow and interact with each other and the
extent to which these connections and interactions reflect in inequalities in
Twitter. Our analysis shows that users identified as White and male tend to
attain higher positions in Twitter, in terms of the number of followers and
number of times in user's lists. We hope our effort can stimulate the
development of new theories of demographic information in the online space.Comment: In Proceedings of the IEEE/WIC/ACM International Conference on Web
Intelligence (WI'17). Leipzig, Germany. August 201
Confounds and Consequences in Geotagged Twitter Data
Twitter is often used in quantitative studies that identify
geographically-preferred topics, writing styles, and entities. These studies
rely on either GPS coordinates attached to individual messages, or on the
user-supplied location field in each profile. In this paper, we compare these
data acquisition techniques and quantify the biases that they introduce; we
also measure their effects on linguistic analysis and text-based geolocation.
GPS-tagging and self-reported locations yield measurably different corpora, and
these linguistic differences are partially attributable to differences in
dataset composition by age and gender. Using a latent variable model to induce
age and gender, we show how these demographic variables interact with geography
to affect language use. We also show that the accuracy of text-based
geolocation varies with population demographics, giving the best results for
men above the age of 40.Comment: final version for EMNLP 201
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
An analysis of the user occupational class through Twitter content
Social media content can be used as a complementary source to the traditional
methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a userās occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications
- ā¦