135 research outputs found
Detecting Online Hate Speech Using Both Supervised and Weakly-Supervised Approaches
In the wake of a polarizing election, social media is laden with hateful content. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. We provide an annotated corpus of hate speech with context information well kept. Then we propose two types of supervised hate speech detection models that incorporate context information, a logistic regression model with context features and a neural network model with learning components for context. Further, to address various limitations of supervised hate speech classification methods including corpus bias and huge cost of annotation, we propose a weakly supervised two-path bootstrapping approach for online hate speech detection by leveraging large-scale unlabeled data. This system significantly outperforms hate speech detection systems that are trained in a supervised manner using manually annotated data. Applying this model on a large quantity of tweets collected before, after, and on election day reveals motivations and patterns of inflammatory language
Detecting Online Hate Speech Using Both Supervised and Weakly-Supervised Approaches
In the wake of a polarizing election, social media is laden with hateful content. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech detection models. We provide an annotated corpus of hate speech with context information well kept. Then we propose two types of supervised hate speech detection models that incorporate context information, a logistic regression model with context features and a neural network model with learning components for context. Further, to address various limitations of supervised hate speech classification methods including corpus bias and huge cost of annotation, we propose a weakly supervised two-path bootstrapping approach for online hate speech detection by leveraging large-scale unlabeled data. This system significantly outperforms hate speech detection systems that are trained in a supervised manner using manually annotated data. Applying this model on a large quantity of tweets collected before, after, and on election day reveals motivations and patterns of inflammatory language
When Politicians Talk About Politics: Identifying Political Tweets of Brazilian Congressmen
Since June 2013, when Brazil faced the largest and most significant mass
protests in a generation, a political crisis is in course. In midst of this
crisis, Brazilian politicians use social media to communicate with the
electorate in order to retain or to grow their political capital. The problem
is that many controversial topics are in course and deputies may prefer to
avoid such themes in their messages. To characterize this behavior, we propose
a method to accurately identify political and non-political tweets
independently of the deputy who posted it and of the time it was posted.
Moreover, we collected tweets of all congressmen who were active on Twitter and
worked in the Brazilian parliament from October 2013 to October 2017. To
evaluate our method, we used word clouds and a topic model to identify the main
political and non-political latent topics in parliamentarian tweets. Both
results indicate that our proposal is able to accurately distinguish political
from non-political tweets. Moreover, our analyses revealed a striking fact:
more than half of the messages posted by Brazilian deputies are non-political.Comment: 4 pages, 7 figures, 2 table
Towards Weakly-Supervised Hate Speech Classification Across Datasets
As pointed out by several scholars, current research on hate speech (HS)
recognition is characterized by unsystematic data creation strategies and
diverging annotation schemata. Subsequently, supervised-learning models tend to
generalize poorly to datasets they were not trained on, and the performance of
the models trained on datasets labeled using different HS taxonomies cannot be
compared. To ease this problem, we propose applying extremely weak supervision
that only relies on the class name rather than on class samples from the
annotated data. We demonstrate the effectiveness of a state-of-the-art
weakly-supervised text classification model in various in-dataset and
cross-dataset settings. Furthermore, we conduct an in-depth quantitative and
qualitative analysis of the source of poor generalizability of HS
classification models.Comment: Accepted to WOAH 7@ACL 202
Doctor of Philosophy in Computer Science
dissertationOver the last decade, social media has emerged as a revolutionary platform for informal communication and social interactions among people. Publicly expressing thoughts, opinions, and feelings is one of the key characteristics of social media. In this dissertation, I present research on automatically acquiring knowledge from social media that can be used to recognize people's affective state (i.e., what someone feels at a given time) in text. This research addresses two types of affective knowledge: 1) hashtag indicators of emotion consisting of emotion hashtags and emotion hashtag patterns, and 2) affective understanding of similes (a form of figurative comparison). My research introduces a bootstrapped learning algorithm for learning hashtag in- dicators of emotions from tweets with respect to five emotion categories: Affection, Anger/Rage, Fear/Anxiety, Joy, and Sadness/Disappointment. With a few seed emotion hashtags per emotion category, the bootstrapping algorithm iteratively learns new hashtags and more generalized hashtag patterns by analyzing emotion in tweets that contain these indicators. Emotion phrases are also harvested from the learned indicators to train additional classifiers that use the surrounding word context of the phrases as features. This is the first work to learn hashtag indicators of emotions. My research also presents a supervised classification method for classifying affective polarity of similes in Twitter. Using lexical, semantic, and sentiment properties of different simile components as features, supervised classifiers are trained to classify a simile into a positive or negative affective polarity class. The property of comparison is also fundamental to the affective understanding of similes. My research introduces a novel framework for inferring implicit properties that 1) uses syntactic constructions, statistical association, dictionary definitions and word embedding vector similarity to generate and rank candidate properties, 2) re-ranks the top properties using influence from multiple simile components, and 3) aggregates the ranks of each property from different methods to create a final ranked list of properties. The inferred properties are used to derive additional features for the supervised classifiers to further improve affective polarity recognition. Experimental results show substantial improvements in affective understanding of similes over the use of existing sentiment resources
- …