4,524 research outputs found
Identifying Purpose Behind Electoral Tweets
Tweets pertaining to a single event, such as a national election, can number
in the hundreds of millions. Automatically analyzing them is beneficial in many
downstream natural language applications such as question answering and
summarization. In this paper, we propose a new task: identifying the purpose
behind electoral tweets--why do people post election-oriented tweets? We show
that identifying purpose is correlated with the related phenomenon of sentiment
and emotion detection, but yet significantly different. Detecting purpose has a
number of applications including detecting the mood of the electorate,
estimating the popularity of policies, identifying key issues of contention,
and predicting the course of events. We create a large dataset of electoral
tweets and annotate a few thousand tweets for purpose. We develop a system that
automatically classifies electoral tweets as per their purpose, obtaining an
accuracy of 43.56% on an 11-class task and an accuracy of 73.91% on a 3-class
task (both accuracies well above the most-frequent-class baseline). Finally, we
show that resources developed for emotion detection are also helpful for
detecting purpose
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Viewpoint Discovery and Understanding in Social Networks
The Web has evolved to a dominant platform where everyone has the opportunity
to express their opinions, to interact with other users, and to debate on
emerging events happening around the world. On the one hand, this has enabled
the presence of different viewpoints and opinions about a - usually
controversial - topic (like Brexit), but at the same time, it has led to
phenomena like media bias, echo chambers and filter bubbles, where users are
exposed to only one point of view on the same topic. Therefore, there is the
need for methods that are able to detect and explain the different viewpoints.
In this paper, we propose a graph partitioning method that exploits social
interactions to enable the discovery of different communities (representing
different viewpoints) discussing about a controversial topic in a social
network like Twitter. To explain the discovered viewpoints, we describe a
method, called Iterative Rank Difference (IRD), which allows detecting
descriptive terms that characterize the different viewpoints as well as
understanding how a specific term is related to a viewpoint (by detecting other
related descriptive terms). The results of an experimental evaluation showed
that our approach outperforms state-of-the-art methods on viewpoint discovery,
while a qualitative analysis of the proposed IRD method on three different
controversial topics showed that IRD provides comprehensive and deep
representations of the different viewpoints
Multilingual Cross-domain Perspectives on Online Hate Speech
In this report, we present a study of eight corpora of online hate speech, by
demonstrating the NLP techniques that we used to collect and analyze the
jihadist, extremist, racist, and sexist content. Analysis of the multilingual
corpora shows that the different contexts share certain characteristics in
their hateful rhetoric. To expose the main features, we have focused on text
classification, text profiling, keyword and collocation extraction, along with
manual annotation and qualitative study.Comment: 24 page
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
Automatic fake news detection is a challenging problem in deception
detection, and it has tremendous real-world political and social impacts.
However, statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets. In this paper, we present
liar: a new, publicly available dataset for fake news detection. We collected a
decade-long, 12.8K manually labeled short statements in various contexts from
PolitiFact.com, which provides detailed analysis report and links to source
documents for each case. This dataset can be used for fact-checking research as
well. Notably, this new dataset is an order of magnitude larger than previously
largest public fake news datasets of similar type. Empirically, we investigate
automatic fake news detection based on surface-level linguistic patterns. We
have designed a novel, hybrid convolutional neural network to integrate
meta-data with text. We show that this hybrid approach can improve a text-only
deep learning model.Comment: ACL 201
- …