2,597 research outputs found
Multilingual Text Classification from Twitter during Emergencies
Social media such as Twitter are a valuable source of information due to their diffusion among citizens and to their speed in sharing data worldwide. However, it is challenging to automatically extract information from such data, given the huge amount of useless content. We propose a multilingual tool that automatically categorizes tweets according to their information content. To achieve real-time classification while supporting any language, we apply a deep learning classifier, using multilingual word embeddings. This allows our solution to be trained on one language and to apply it to any other language via zero-shot inference achieving acceptable performance loss
Community Segmentation and Inclusive Social Media Listening
Social media analytics provide a generalized picture of situational awareness from the conversations happening among communities present in social media channels that are that are, or risk being affected by crises. The generalized nature of results from these analytics leaves underrepresented communities in the background. When considering social media analytics, concerns, sentiment, and needs are perceived as homogenous. However, offline, the community is diverse, often segmented by age group, occupation, or language, to name a few. Through our analysis of interviews from professionals using social media as a source of information in public service organizations, we argue that practitioners might not be perceiving this segmentation from the social media conversation. In addition, practitioners who are aware of this limitation, agree that there is room for improvement and resort to alternative mechanisms to understand, reach, and provide services to these communities in need. Thus, we analyze current perceptions and activities around segmentation and provide suggestions that could inform the design of social media analytics tools that support inclusive public services for all, including persons with disabilities and from other disadvantaged groups.publishedVersionPaid open acces
Classifying Crises-Information Relevancy with Semantics
Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However,
such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis
IMEXT: a method and system to extract geolocated images from Tweets - Analysis of a case study
open5noopenFrancalanci, Chiara; Guglielmino, Paolo; Montalcini, Matteo; Scalia, Gabriele; Pernici, BarbaraFrancalanci, Chiara; Guglielmino, Paolo; Montalcini, Matteo; Scalia, Gabriele; Pernici, Barbar
TriggerCit: Early Flood Alerting using Twitter and Geolocation - A Comparison with Alternative Sources
Rapid impact assessment in the immediate aftermath of a natural disaster is
essential to provide adequate information to international organisations, local
authorities, and first responders. Social media can support emergency response
with evidence-based content posted by citizens and organisations during ongoing
events. In the paper, we propose TriggerCit: an early flood alerting tool with
a multilanguage approach focused on timeliness and geolocation. The paper
focuses on assessing the reliability of the approach as a triggering system,
comparing it with alternative sources for alerts, and evaluating the quality
and amount of complementary information gathered. Geolocated visual evidence
extracted from Twitter by TriggerCit was analysed in two case studies on floods
in Thailand and Nepal in 2021.Comment: 12 pages Keywords Social Media, Disaster management, Early Alertin
Terminology Extraction for and from Communications in Multi-disciplinary Domains
Terminology extraction generally refers to methods and systems for identifying term candidates in a uni-disciplinary and uni-lingual
environment such as engineering, medical, physical and geological sciences, or administration, business and leisure. However, as
human enterprises get more and more complex, it has become increasingly important for teams in one discipline to collaborate with
others from not only a non-cognate discipline but also speaking a different language. Disaster mitigation and recovery, and conflict
resolution are amongst the areas where there is a requirement to use standardised multilingual terminology for communication. This
paper presents a feasibility study conducted to build terminology (and ontology) in the domain of disaster management and is part of the
broader work conducted for the EU project Sland \ub4 ail (FP7 607691). We have evaluated CiCui (for Chinese name \ub4 \u8bcd\u8403, which translates to
words gathered), a corpus-based text analytic system that combine frequency, collocation and linguistic analyses to extract candidates
terminologies from corpora comprised of domain texts from diverse sources. CiCui was assessed against four terminology extraction
systems and the initial results show that it has an above average precision in extracting terms
Recommended from our members
Identifying and Processing Crisis Information from Social Media
Social media platforms play a crucial role in how people communicate, particularly during crisis situations such as natural disasters. People share and disseminate information on social media platforms that relates to updates, alerts, rescue and relief requests among other crisis relevant information. Hurricane Harvey and Hurricane Sandy saw over tens of millions of posts getting generated, on Twitter, in a short span of time. The ambit of such posts spreads across a wide range such as personal and official communications, and citizen sensing, to mention a few. This makes social media platforms a source of vital information to different stakeholders in crisis situations such as impacted communities, relief agencies, and civic authorities. However, the overwhelming volume of data generated during such times, makes it impossible to manually identify information relevant to crisis. Additionally, a large portion of posts in voluminous streams is not relevant or bears minimal relevance to crisis situations.
This has steered much research towards exploring methods that can automatically identify crisis relevant information from voluminous streams of data during such scenarios. However, the problem of identifying crisis relevant information from social media platforms, such as Twitter, is not trivial given the nature of unstructured text such as short text length and syntactic variations among other challenges. A key objective, while creating automatic crisis relevancy classification systems, is to make them adaptable to a wide range of crisis types and languages. Many related approaches rely on statistical features which are quantifiable properties and linguistic properties of the text. A general approach is to train the classification model on labelled data acquired from crisis events and evaluate on other crisis events. A key aspect missing from explored literature is the validity of crisis relevancy classification models when applied to data from unseen types of crisis events and languages. For instance, how would the accuracy of a crisis relevancy classification model, trained on earthquake type of events, change when applied to flood type of events. Or, how would a model perform when trained on crisis data in English but applied to data in Italian.
This thesis investigates these problems from a semantics perspective, where the challenges posed by diverse types of crisis and language variations are seen as the problems that can be tackled by enriching the data semantically. The use of knowledge bases such as DBpedia, BabelNet, and Wikipedia, for semantic enrichment of data in text classification problems has often been studied. Semantic enrichment of data through entity linking and expansion of context via knowledge bases can take advantage of connections between different concepts and thus enhance contextual coherency across crisis types and languages. Several previous works have focused on similar problems and proposed approaches using statistical features and/or non-semantic features. The use of semantics extracted through knowledge graphs has remained unexplored in building crisis relevancy classifiers that are adaptive to varying crisis types and multilingual data. Experiments conducted in this thesis consider data from Twitter, a micro-blogging social media platform, and analyse multiple aspects of crisis data classification. The results obtained through various analyses in this thesis demonstrate the value of semantic enrichment of text through knowledge graphs in improving the adaptability of crisis relevancy classifiers across crisis types and languages, in comparison to statistical features as often used in much of the related work
- …