5 research outputs found

    Tweet Classification for Crisis Response

    Get PDF
    Tweet classification for crisis response is a text classification task that aims at identifying whether a tweet is related to a specific crisis event or not. Humanitarian organisations that intend to respond to people in need in the early hours of a crisis suffer from monitoring the massive number of tweets posted in real time. Therefore, the main objective of tweet classification models for crisis response is to filter the crisis-related tweets to simplify the work for these organisations. Still, crisis events have different characteristics, which prevents current models trained on past events from generalising in identifying tweets from new disasters, which is infeasible to be manually labelled at the crisis onset. This thesis introduces frameworks under the umbrella of distant supervision and domain adaptation to minimize the gap or maximize the similarities between training and testing data from disaster events. The contributions demonstrate the effectiveness of using automatically labelled training data from past or emerging events in tweet classification tasks for English and Arabic crisis tweets. To this end, we propose an automatically labelling framework that utilises distant supervision via an external knowledge base. Then, we introduce an approach that unifies our framework and adaptation techniques which automatically labels incoming tweets from an emerging incident. This approach can be seen as a robust method to classify unseen English tweets from current events. However, it has its restrictions when applied to tweets from other languages, especially if the language comes with limited resources, different text structures, and different people’s behavior in posting tweets such as Arabic. Hence, we adapt our framework with significant changes to suit Arabic user-generated posts. Our results for both English and Arabic tweets show that our original and adaptive approaches continuously improve the classifier’s performance compared with existing labelling techniques in different adaptation methods

    Automatic Labeling of Tweets for Crisis Response Using Distant Supervision

    Get PDF
    Current tweet classification models aimed at enhancing crisis response are based on supervised deep learning. They rely on the quality and quantity of human-labeled training data. Still, the available training data is small in size and imbalanced in coverage of crisis types, which prevents the models from generalization, and as it is manually labeled, it is also expensive to produce. To overcome these problems, distant supervision can be applied to automatically generate large-scale labeled data for tweet classification for crisis response. Experimental results on different crisis events show that our work can produce good quality labeled data from past and recent events. Substituting automatically labeled training data for part of the manually labeled training data has a minimal impact on the model performance, indicating that automatically labeled data can be used when no hand-labeled data is available

    Deep Learning and Word Embeddings for Tweet Classification for Crisis Response

    Get PDF
    Tradition tweet classification models for crisis response focus on convolutional layers and domain-specific word embeddings. In this paper, we study the application of different neural networks with general-purpose and domain-specific word embeddings to investigate their ability to improve the performance of tweet classification models. We evaluate four tweet classification models on CrisisNLP dataset and obtain comparable results which indicates that general-purpose word embedding such as GloVe can be used instead of domain-specific word embedding especially with Bi-LSTM where results reported the highest performance of 62.04% F1 score

    Domain Adaptation for Arabic Crisis Response

    Get PDF
    Deep learning algorithms can identify related tweets to reduce the information overload that prevents humanitarian organisations from using valuable Twitter posts. However, they rely heavily on human-labelled data, which are unavailable for emerging crises. Because each crisis has its own features, such as location, time and social media response, current models are known to suffer from generalising to unseen disaster events when pre-trained on past ones. Tweet classifiers for low-resource languages like Arabic has the additional issue of limited labelled data duplicates caused by the absence of good language resources. Thus, we propose a novel domain adaptation approach that employs distant supervision to automatically label tweets from emerging Arabic crisis events to be used to train a model along with available human-labelled data. We evaluate our work on data from seven 2018–2020 Arabic events from different crisis types (flood, explosion, virus and storm). Results show that our method outperforms self-training in identifying crisis-related tweets in real-time scenarios and can be seen as a robust Arabic tweet classifier
    corecore