5 research outputs found
Tweet Classification for Crisis Response
Tweet classification for crisis response is a text classification task that aims at identifying
whether a tweet is related to a specific crisis event or not. Humanitarian
organisations that intend to respond to people in need in the early hours of a crisis
suffer from monitoring the massive number of tweets posted in real time. Therefore,
the main objective of tweet classification models for crisis response is to filter the
crisis-related tweets to simplify the work for these organisations. Still, crisis events
have different characteristics, which prevents current models trained on past events
from generalising in identifying tweets from new disasters, which is infeasible to be
manually labelled at the crisis onset. This thesis introduces frameworks under the
umbrella of distant supervision and domain adaptation to minimize the gap or maximize
the similarities between training and testing data from disaster events. The
contributions demonstrate the effectiveness of using automatically labelled training
data from past or emerging events in tweet classification tasks for English and Arabic
crisis tweets. To this end, we propose an automatically labelling framework that
utilises distant supervision via an external knowledge base. Then, we introduce an
approach that unifies our framework and adaptation techniques which automatically
labels incoming tweets from an emerging incident. This approach can be seen
as a robust method to classify unseen English tweets from current events. However,
it has its restrictions when applied to tweets from other languages, especially
if the language comes with limited resources, different text structures, and different
people’s behavior in posting tweets such as Arabic. Hence, we adapt our framework
with significant changes to suit Arabic user-generated posts. Our results for
both English and Arabic tweets show that our original and adaptive approaches
continuously improve the classifier’s performance compared with existing labelling
techniques in different adaptation methods
Automatic Labeling of Tweets for Crisis Response Using Distant Supervision
Current tweet classification models aimed at enhancing crisis response are based on supervised deep learning. They rely on the quality and quantity of human-labeled training data. Still, the available training data is small in size and imbalanced in coverage of crisis types, which prevents the models from generalization, and as it is manually labeled, it is also expensive to produce. To overcome these problems, distant supervision can be applied to automatically generate large-scale labeled data for tweet classification for crisis response. Experimental results on different crisis events show that our work can produce good quality labeled data from past and recent events. Substituting automatically labeled training data for part of the manually labeled training data has a minimal impact on the model performance, indicating that automatically labeled data can be used when no hand-labeled data is available
Deep Learning and Word Embeddings for Tweet Classification for Crisis Response
Tradition tweet classification models for crisis response focus on convolutional layers and domain-specific word embeddings. In this paper, we study the application of different neural networks with general-purpose and domain-specific word embeddings to investigate their ability to improve the performance of tweet classification models. We evaluate four tweet classification models on CrisisNLP dataset and obtain comparable results which indicates that general-purpose word embedding such as GloVe can be used instead of domain-specific word embedding especially with Bi-LSTM where results reported the highest performance of 62.04% F1 score
Domain Adaptation for Arabic Crisis Response
Deep learning algorithms can identify related tweets to reduce the information overload that prevents humanitarian organisations from using valuable Twitter posts. However, they rely heavily on human-labelled data, which are unavailable for emerging crises. Because each crisis has its own features, such as location, time and social media response, current models are known to suffer from generalising to unseen disaster events when pre-trained on past ones. Tweet classifiers for low-resource languages like Arabic has the additional issue of limited labelled data duplicates caused by the absence of good language resources. Thus, we propose a novel domain adaptation approach that employs distant supervision to automatically label tweets from emerging Arabic crisis events to be used to train a model along with available human-labelled data. We evaluate our work on data from seven 2018–2020 Arabic events from different crisis types (flood, explosion, virus and storm). Results show that our method outperforms self-training in identifying crisis-related tweets in real-time scenarios and can be seen as a robust Arabic tweet classifier