249 research outputs found
OntoDSumm : Ontology based Tweet Summarization for Disaster Events
The huge popularity of social media platforms like Twitter attracts a large
fraction of users to share real-time information and short situational messages
during disasters. A summary of these tweets is required by the government
organizations, agencies, and volunteers for efficient and quick disaster
response. However, the huge influx of tweets makes it difficult to manually get
a precise overview of ongoing events. To handle this challenge, several tweet
summarization approaches have been proposed. In most of the existing
literature, tweet summarization is broken into a two-step process where in the
first step, it categorizes tweets, and in the second step, it chooses
representative tweets from each category. There are both supervised as well as
unsupervised approaches found in literature to solve the problem of first step.
Supervised approaches requires huge amount of labelled data which incurs cost
as well as time. On the other hand, unsupervised approaches could not clusters
tweet properly due to the overlapping keywords, vocabulary size, lack of
understanding of semantic meaning etc. While, for the second step of
summarization, existing approaches applied different ranking methods where
those ranking methods are very generic which fail to compute proper importance
of a tweet respect to a disaster. Both the problems can be handled far better
with proper domain knowledge. In this paper, we exploited already existing
domain knowledge by the means of ontology in both the steps and proposed a
novel disaster summarization method OntoDSumm. We evaluate this proposed method
with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results
reveal that OntoDSumm outperforms existing methods by approximately 2-66% in
terms of ROUGE-1 F1 score
Interpretable classification and summarization of crisis events from microblogs
The widespread use of social media platforms has created convenient ways to obtain and spread up-to-date information during crisis events such as disasters. Time-critical analysis of crisis-related information helps humanitarian organizations and governmental bodies gain actionable information and plan for aid response. However, situational information is often immersed in a high volume of irrelevant content. Moreover, crisis-related messages also vary greatly in terms of information types, ranging from general situational awareness - such as information about warnings, infrastructure damages, and casualties - to individual needs. Different humanitarian organizations or governmental bodies usually demand information of different types for various tasks such as crisis preparation, resource planning, and aid response. To cope with information overload and efficiently support stakeholders in crisis situations, it is necessary to (a) classify data posted during crisis events into fine-grained humanitarian categories, (b) summarize the situational data in near real-time.
In this thesis, we tackle the aforementioned problems and propose novel methods for the classification and summarization of user-generated posts from microblogs. Previous studies have introduced various machine learning techniques to assist humanitarian or governmental bodies, but they primarily focused on model performance. Unlike those works, we develop interpretable machine-learning models which can provide explanations of model decisions. Generally, we focus on three methods for reducing information overload in crisis situations: (i) post classification, (ii) post summarization, (iii) interpretable models for post classification and summarization. We evaluate our methods using posts from the microblogging platform Twitter, so-called tweets. First, we expand publicly available labeled datasets with rationale annotations. Each tweet is annotated with a class label and rationales, which are short snippets from the tweet to explain its assigned label. Using the data, we develop trustworthy classification methods that give the best tradeoff between model performance and interoperability. Rationale snippets usually convey essential information in the tweets. Hence, we propose an integer linear programming-based summarization method that maximizes the coverage of rationale phrases to generate summaries of class-level tweet data. Next, we introduce an approach that can enhance latent embedding representations of tweets in vector space. Our approach helps improve the classification performance-interpretability tradeoff and detect near duplicates for designing a summarization model with low computational complexity. Experiments show that rationale labels are helpful for developing interpretable-by-design models. However, annotations are not always available, especially in real-time situations for new tasks and crisis events. In the last part of the thesis, we propose a two-stage approach to extract the rationales under minimal human supervision
PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT
Disaster summarization approaches provide an overview of the important
information posted during disaster events on social media platforms, such as,
Twitter. However, the type of information posted significantly varies across
disasters depending on several factors like the location, type, severity, etc.
Verification of the effectiveness of disaster summarization approaches still
suffer due to the lack of availability of good spectrum of datasets along with
the ground-truth summary. Existing approaches for ground-truth summary
generation (ground-truth for extractive summarization) relies on the wisdom and
intuition of the annotators. Annotators are provided with a complete set of
input tweets from which a subset of tweets is selected by the annotators for
the summary. This process requires immense human effort and significant time.
Additionally, this intuition-based selection of the tweets might lead to a high
variance in summaries generated across annotators. Therefore, to handle these
challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we
partly automate the ground-truth summary generation procedure. This approach
reduces the effort and time of the annotators while ensuring the quality of the
created ground-truth summary. We validate the effectiveness of PORTRAIT on 5
disaster events through quantitative and qualitative comparisons of
ground-truth summaries generated by existing intuitive approaches, a
semi-automated approach, and PORTRAIT. We prepare and release the ground-truth
summaries for 5 disaster events which consist of both natural and man-made
disaster events belonging to 4 different countries. Finally, we provide a study
about the performance of various state-of-the-art summarization approaches on
the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores
Can we predict a riot? Disruptive event detection using Twitter
In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases
NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities
Although a lot of research has been done on utilising Online Social Media
during disasters, there exists no system for a specific task that is critical
in a post-disaster scenario -- identifying resource-needs and
resource-availabilities in the disaster-affected region, coupled with their
subsequent matching. To this end, we present NARMADA, a semi-automated platform
which leverages the crowd-sourced information from social media posts for
assisting post-disaster relief coordination efforts. The system employs Natural
Language Processing and Information Retrieval techniques for identifying
resource-needs and resource-availabilities from microblogs, extracting
resources from the posts, and also matching the needs to suitable
availabilities. The system is thus capable of facilitating the judicious
management of resources during post-disaster relief operations.Comment: ACL 2020 Workshop on Natural Language Processing for Social Media
(SocialNLP
Using social media for sub-event detection during disasters
AbstractSocial media platforms have become fundamental tools for sharing information during natural disasters or catastrophic events. This paper presents SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters), a new method that analyzes user posts to discover sub-events that occurred after a disaster (e.g., collapsed buildings, broken gas pipes, floods). SEDOM-DD has been evaluated with datasets of different sizes that contain real posts from social media related to different natural disasters (e.g., earthquakes, floods and hurricanes). Starting from such data, we generated synthetic datasets with different features, such as different percentages of relevant posts and/or geotagged posts. Experiments performed on both real and synthetic datasets showed that SEDOM-DD is able to identify sub-events with high accuracy. For example, with a percentage of relevant posts of 80% and geotagged posts of 15%, our method detects the sub-events and their areas with an accuracy of 85%, revealing the high accuracy and effectiveness of the proposed approach
- …