Search CORE

442 research outputs found

Recommended from our members

New topic detection in microblogs and topic model evaluation using topical alignment

Author: Rajani Nazneen Fatema
Publication venue
Publication date: 16/09/2014
Field of study

textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy. We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models. Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery. This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out. To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science

Texas ScholarWorks

Recommended from our members

Verifying baselines for crisis event information classification on Twitter

Author: Crow Justin Michael
Publication venue: 'Virginia Tech Libraries'
Publication date: 01/01/2020
Field of study

Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and complimentary type of crisis-relevant information to demonstrate it’s generalisability

Sussex Research Online

NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities

Author: Dutt Ritam
Ghosh Kripabandhu
Ghosh Saptarshi
Hiware Kaustubh
Patro Sohan
Sinha Sayan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Although a lot of research has been done on utilising Online Social Media during disasters, there exists no system for a specific task that is critical in a post-disaster scenario -- identifying resource-needs and resource-availabilities in the disaster-affected region, coupled with their subsequent matching. To this end, we present NARMADA, a semi-automated platform which leverages the crowd-sourced information from social media posts for assisting post-disaster relief coordination efforts. The system employs Natural Language Processing and Information Retrieval techniques for identifying resource-needs and resource-availabilities from microblogs, extracting resources from the posts, and also matching the needs to suitable availabilities. The system is thus capable of facilitating the judicious management of resources during post-disaster relief operations.Comment: ACL 2020 Workshop on Natural Language Processing for Social Media (SocialNLP

arXiv.org e-Print Archive

Crossref

Classifying Crises-Information Relevancy with Semantics

Author: A Tonon
F Abel
G Burel
H Gao
J Rogstadius
J Yin
N Cristianini
R Navigli
R Power
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis

Crossref

Open Research Online (The Open University)

Classification of Short-Texts Generated During Disasters: Traditional and Deep learning Approach

Author: Kundu Shamik
Srijith P K
Publication venue
Publication date: 01/01/2018
Field of study

Micro-blogging sites provide a wealth of resources during disaster events in the form of short texts. Correct classification of those short texts into various actionable classes can be of great help in shaping the means to rescue people in disaster-a�ected places. The process of classification of short texts poses a challenging problem because the texts are usually short and very noisy and Inding good features that can distinguish these texts into di�erent classes is time consuming, tedious and often requires a lot of domain knowledge. In this thesis, we explore various non-deep learning and deep learning methods and propose a deep learning based model to classify tweets into difierent actionable classes such as resource need and availability, activities of various NGO etc. The proposed model requires no domain knowledge and can be used in any disaster scenario with little to no modification. Keywords: Text classification, Topic Modelling, LDA, Word-embeddings, LSTM, Deep Learnin

Research Archive of Indian Institute of Technology Hyderabad

OntoDSumm : Ontology based Tweet Summarization for Disaster Events

Author: Chakraborty Roshni
Dandapat Sourav Kumar
Garg Piyush Kumar
Publication venue
Publication date: 19/11/2022
Field of study

The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overview of ongoing events. To handle this challenge, several tweet summarization approaches have been proposed. In most of the existing literature, tweet summarization is broken into a two-step process where in the first step, it categorizes tweets, and in the second step, it chooses representative tweets from each category. There are both supervised as well as unsupervised approaches found in literature to solve the problem of first step. Supervised approaches requires huge amount of labelled data which incurs cost as well as time. On the other hand, unsupervised approaches could not clusters tweet properly due to the overlapping keywords, vocabulary size, lack of understanding of semantic meaning etc. While, for the second step of summarization, existing approaches applied different ranking methods where those ranking methods are very generic which fail to compute proper importance of a tweet respect to a disaster. Both the problems can be handled far better with proper domain knowledge. In this paper, we exploited already existing domain knowledge by the means of ontology in both the steps and proposed a novel disaster summarization method OntoDSumm. We evaluate this proposed method with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results reveal that OntoDSumm outperforms existing methods by approximately 2-66% in terms of ROUGE-1 F1 score

arXiv.org e-Print Archive