851 research outputs found
Unsupervised User Stance Detection on Twitter
We present a highly effective unsupervised framework for detecting the stance
of prolific Twitter users with respect to controversial topics. In particular,
we use dimensionality reduction to project users onto a low-dimensional space,
followed by clustering, which allows us to find core users that are
representative of the different stances. Our framework has three major
advantages over pre-existing methods, which are based on supervised or
semi-supervised classification. First, we do not require any prior labeling of
users: instead, we create clusters, which are much easier to label manually
afterwards, e.g., in a matter of seconds or minutes instead of hours. Second,
there is no need for domain- or topic-level knowledge either to specify the
relevant stances (labels) or to conduct the actual labeling. Third, our
framework is robust in the face of data skewness, e.g., when some users or some
stances have greater representation in the data. We experiment with different
combinations of user similarity features, dataset sizes, dimensionality
reduction methods, and clustering algorithms to ascertain the most effective
and most computationally efficient combinations across three different datasets
(in English and Turkish). We further verified our results on additional tweet
sets covering six different controversial topics. Our best combination in terms
of effectiveness and efficiency uses retweeted accounts as features, UMAP for
dimensionality reduction, and Mean Shift for clustering, and yields a small
number of high-quality user clusters, typically just 2--3, with more than 98\%
purity. The resulting user clusters can be used to train downstream
classifiers. Moreover, our framework is robust to variations in the
hyper-parameter values and also with respect to random initialization
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- …