906 research outputs found
Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling
Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue
and Zika in Brasil and other tropical regions has long been a priority for
governments in affected areas. Streaming social media content, such as Twitter,
is increasingly being used for health vigilance applications such as flu
detection. However, previous work has not addressed the complexity of drastic
seasonal changes on Twitter content across multiple epidemic outbreaks. In
order to address this gap, this paper contrasts two complementary approaches to
detecting Twitter content that is relevant for Dengue outbreak detection,
namely supervised classification and unsupervised clustering using topic
modelling. Each approach has benefits and shortcomings. Our classifier achieves
a prediction accuracy of about 80\% based on a small training set of about
1,000 instances, but the need for manual annotation makes it hard to track
seasonal changes in the nature of the epidemics, such as the emergence of new
types of virus in certain geographical locations. In contrast, LDA-based topic
modelling scales well, generating cohesive and well-separated clusters from
larger samples. While clusters can be easily re-generated following changes in
epidemics, however, this approach makes it hard to clearly segregate relevant
tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano,
Switzerlan
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
We combine multi-task learning and semi-supervised learning by inducing a
joint embedding space between disparate label spaces and learning transfer
functions between label embeddings, enabling us to jointly leverage unlabelled
data and auxiliary, annotated datasets. We evaluate our approach on a variety
of sequence classification tasks with disparate label spaces. We outperform
strong single and multi-task baselines and achieve a new state-of-the-art for
topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
Sarcasm Detection in English and Arabic Tweets Using Transformer Models
This thesis describes our approach toward the detection of sarcasm and its various types in English and Arabic Tweets through methods in deep learning. There are five problems we attempted: (1) detection of sarcasm in English Tweets, (2) detection of sarcasm in Arabic Tweets, (3) determining the type of sarcastic speech subcategory for English Tweets, (4) determining which of two semantically equivalent English Tweets is sarcastic, and (5) determining which of two semantically equivalent Arabic Tweets is sarcastic. All tasks were framed as classification problems, and our contributions are threefold: (a) we developed an English binary classifier system with RoBERTa, (b) an Arabic binary classifier with XLM-RoBERTa, and (c) an English multilabel classifier with BERT. Pre-processing steps are taken with labeled input data prior to tokenization, such as extracting and appending verbs/adjectives or representative/significant keywords to the end of an input tweet to help the models better understand and generalize sarcasm detection. We also discuss the results of simple data augmentation techniques to improve the quality of the given training dataset as well as an alternative approach to the question of multilabel sequence classification. Ultimately, our systems place us in the top 14 participants for each of the five tasks in a sarcasm detection competition
Text Analysis of Airline Tweets
By acting as a succinct summary, keywords and key phrases can be a useful tool for swiftly assessing enormous amounts of textual material. A keyword is defined as a word that briefly and accurately characterises the subject, or an aspect of the subject, presented in a text, according to the International Encyclopaedia of Information and Library Science (Bolger et al., 1989) (Feather et al., 1996). People are more likely to complain when they are anxious, according to research (Bolger et al., 1989)(Meier et al., 2013), and moods are affected by time (Ryan et al., 2010). Due to this study, airlines will have a tool to calibrate and judge the positivity/negativity of tweets based on the day of the week, which is a topic that has yet to be researched. We want to do text and sentiment analysis on extracted airline travel tweets, taking into account when the tweet was ‘tweeted’ and if it had a good or negative impact
Building a Sentiment Corpus of Tweets in Brazilian Portuguese
The large amount of data available in social media, forums and websites
motivates researches in several areas of Natural Language Processing, such as
sentiment analysis. The popularity of the area due to its subjective and
semantic characteristics motivates research on novel methods and approaches for
classification. Hence, there is a high demand for datasets on different domains
and different languages. This paper introduces TweetSentBR, a sentiment corpora
for Brazilian Portuguese manually annotated with 15.000 sentences on TV show
domain. The sentences were labeled in three classes (positive, neutral and
negative) by seven annotators, following literature guidelines for ensuring
reliability on the annotation. We also ran baseline experiments on polarity
classification using three machine learning methods, reaching 80.99% on
F-Measure and 82.06% on accuracy in binary classification, and 59.85% F-Measure
and 64.62% on accuracy on three point classification.Comment: Accepted for publication in 11th International Conference on Language
Resources and Evaluation (LREC 2018
Sarcasm Detection in a Disaster Context
During natural disasters, people often use social media platforms such as
Twitter to ask for help, to provide information about the disaster situation,
or to express contempt about the unfolding event or public policies and
guidelines. This contempt is in some cases expressed as sarcasm or irony.
Understanding this form of speech in a disaster-centric context is essential to
improving natural language understanding of disaster-related tweets. In this
paper, we introduce HurricaneSARC, a dataset of 15,000 tweets annotated for
intended sarcasm, and provide a comprehensive investigation of sarcasm
detection using pre-trained language models. Our best model is able to obtain
as much as 0.70 F1 on our dataset. We also demonstrate that the performance on
HurricaneSARC can be improved by leveraging intermediate task transfer
learning. We release our data and code at
https://github.com/tsosea2/HurricaneSarc
- …