Search CORE

2 research outputs found

Estimating Distributed Representation Performance in Disaster-Related Social Media Classification

Author: Alam F.
Imran M.
O'Keefe S. E. M.
Olteanu A.
Parilla-Ferrer B. E.
Rudra K.
Vieweg S.
Publication venue: Technological University Dublin
Publication date: 01/09/2019
Field of study

This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios

Crossref

Arrow@TUDublin

Is a Pretrained Model the Answer to Situational Awareness Detection on Social Media?

Author: Lee Kahhe
Lo Siaw Ling
Zhang Yuhao
Publication venue
Publication date: 01/01/2023
Field of study

Social media can be valuable for extracting information about an event or incident on the ground. However, the vast amount of content shared, and the linguistic variants of languages used on social media make it challenging to identify important situational awareness content to aid in decision-making for first responders. In this study, we assess whether pretrained models can be used to address the aforementioned challenges on social media. Various pretrained models, including static word embedding (such as Word2Vec and GloVe) and contextualized word embedding (such as DistilBERT) are studied in detail. According to our findings, a vanilla DistilBERT pretrained language model is insufficient to identify situation awareness information. Fine-tuning by using datasets of various event types and vocabulary extension is essential to adapt a DistilBERT model for real-world situational awareness detection

Institutional Knowledge at Singapore Management University

ScholarSpace at University of Hawai'i at Manoa