442 research outputs found
Recommended from our members
New topic detection in microblogs and topic model evaluation using topical alignment
textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science
Recommended from our members
Verifying baselines for crisis event information classification on Twitter
Social media are rich information sources during and in the aftermath of crisis events such as earthquakes and terrorist attacks. Despite myriad challenges, with the right tools, significant insight can be gained which can assist emergency responders and related applications. However, most extant approaches are incomparable, using bespoke definitions, models, datasets and even evaluation metrics. Furthermore, it is rare that code, trained models, or exhaustive parametrisation details are made openly available. Thus, even confirmation of self-reported performance is problematic; authoritatively determining the state of the art (SOTA) is essentially impossible. Consequently, to begin addressing such endemic ambiguity, this paper seeks to make 3 contributions: 1) the replication and results confirmation of a leading (and generalisable) technique; 2) testing straightforward modifications of the technique likely to improve performance; and 3) the extension of the technique to a novel and complimentary type of crisis-relevant information to demonstrate it’s generalisability
NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities
Although a lot of research has been done on utilising Online Social Media
during disasters, there exists no system for a specific task that is critical
in a post-disaster scenario -- identifying resource-needs and
resource-availabilities in the disaster-affected region, coupled with their
subsequent matching. To this end, we present NARMADA, a semi-automated platform
which leverages the crowd-sourced information from social media posts for
assisting post-disaster relief coordination efforts. The system employs Natural
Language Processing and Information Retrieval techniques for identifying
resource-needs and resource-availabilities from microblogs, extracting
resources from the posts, and also matching the needs to suitable
availabilities. The system is thus capable of facilitating the judicious
management of resources during post-disaster relief operations.Comment: ACL 2020 Workshop on Natural Language Processing for Social Media
(SocialNLP
Classifying Crises-Information Relevancy with Semantics
Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However,
such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis
Classification of Short-Texts Generated During Disasters: Traditional and Deep learning Approach
Micro-blogging sites provide a wealth of resources during disaster events in the form of short texts.
Correct classification of those short texts into various actionable classes can be of great help in
shaping the means to rescue people in disaster-a�ected places. The process of classification of short
texts poses a challenging problem because the texts are usually short and very noisy and Inding good
features that can distinguish these texts into di�erent classes is time consuming, tedious and often
requires a lot of domain knowledge. In this thesis, we explore various non-deep learning and deep
learning methods and propose a deep learning based model to classify tweets into difierent actionable
classes such as resource need and availability, activities of various NGO etc. The proposed model
requires no domain knowledge and can be used in any disaster scenario with little to no modification.
Keywords: Text classification, Topic Modelling, LDA, Word-embeddings, LSTM, Deep Learnin
OntoDSumm : Ontology based Tweet Summarization for Disaster Events
The huge popularity of social media platforms like Twitter attracts a large
fraction of users to share real-time information and short situational messages
during disasters. A summary of these tweets is required by the government
organizations, agencies, and volunteers for efficient and quick disaster
response. However, the huge influx of tweets makes it difficult to manually get
a precise overview of ongoing events. To handle this challenge, several tweet
summarization approaches have been proposed. In most of the existing
literature, tweet summarization is broken into a two-step process where in the
first step, it categorizes tweets, and in the second step, it chooses
representative tweets from each category. There are both supervised as well as
unsupervised approaches found in literature to solve the problem of first step.
Supervised approaches requires huge amount of labelled data which incurs cost
as well as time. On the other hand, unsupervised approaches could not clusters
tweet properly due to the overlapping keywords, vocabulary size, lack of
understanding of semantic meaning etc. While, for the second step of
summarization, existing approaches applied different ranking methods where
those ranking methods are very generic which fail to compute proper importance
of a tweet respect to a disaster. Both the problems can be handled far better
with proper domain knowledge. In this paper, we exploited already existing
domain knowledge by the means of ontology in both the steps and proposed a
novel disaster summarization method OntoDSumm. We evaluate this proposed method
with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results
reveal that OntoDSumm outperforms existing methods by approximately 2-66% in
terms of ROUGE-1 F1 score
- …