13 research outputs found

    A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification

    Get PDF
    The research aims to compare the classification performance of natural disaster messages classification from Twitter. The research experiment covers the analysis of three-word embedding-based extraction feature techniques and five different models of deep learning. The word embedding techniques that are used in this experiment are Word2Vec, fastText, and Glove. The experiment uses five deep learning models, namely three models of different dimensions of Convolutional Neural Network (1D CNN, 2D CNN, 3D CNN), Long Short-Term Memory Network (LSTM), and Bidirectional Encoder Representations for Transformer (BERT). The models are tested on four natural disaster messages datasets: earthquakes, floods, forest fires, and hurricanes. Those models are tested for classification performanc

    Identificación de actores en un desastre a través de Twitter: Caso de estudio SINABUNG 2018

    Get PDF
    Twitter has become an important tool for knowing in real time what happens in the political, social and economic world. This platform is increasingly attractive as a communication method, which can be used in logistic and humanitarian operations processes improving communication between the actors involved in a natural disaster situation. Thus, in the present investigation a Social Network Analysis SNA approach is implemented using data generated in the social network Twitter about a disaster event analyzing three important actors: users, hashtags and URLs. The methodology is applied to a disaster study case (Sinabung volcano eruption in 2018). From this analysis, relevant users, topics and sources of information were identified during the disaster’s occurrence. These analyzes offer an overview of the interactions and impact of the most influential elements during the event under study, having important contribution news teams, social networks and research centers. The findings of the present study are compared with a previous study finding similarities in most of these but having in this study an additional identification of actors of the academic and technical field who seek to contribute and disseminate relevant information of the disruptive event.Twitter se ha convertido en una herramienta importante para conocer en tiempo real lo que sucede en el mundo político, social y económico. Esta plataforma es cada vez más atractiva como medio de comunicación para diferentes tipos de eventos, puede ser usada en procesos de operaciones logísticas y humanitarias mejorando la comunicación entre los actores involucrados en una situación de un desastre natural. El enfoque de Análisis de Redes Sociales ARS se usó para datos generados en la red social Twitter para un evento de desastre natural, analizando tres actores importantes, los usuarios, hashtags y URLs. En el presente trabajo se presenta una metodología ARS implementada en un caso de estudio de desastre (erupción del volcán Sinabung en 2018). A partir de los análisis se identificaron usuarios, temas y fuentes de información relevantes durante la ocurrencia del desastre. Los análisis ofrecen una vista general de las interacciones e impacto de los elementos más influyentes durante el evento bajo estudio, teniendo una importancia destacada los equipos de noticia, redes sociales y centros de investigación. Los hallazgos del estudio son comparados con un estudio anterior, encontrando similitudes en la mayoría de estos, sin embargo, en nuestro estudio se identificó nuevos actores del ámbito técnico académico que buscan contribuir y difundir información relevante del evento disruptivo

    Identifying landscape relevant natural language using actively crowdsourced landscape descriptions and sentence-transformers

    Full text link
    Natural language has proven to be a valuable source of data for various scientific inquiries including landscape perception and preference research. However, large high quality landscape relevant corpora are scare. We here propose and discuss a natural language processing workflow to identify landscape relevant documents in large collections of unstructured text. Using a small curated high quality collection of actively crowdsourced landscape descriptions we identify and extract similar documents from two different corpora (Geograph and WikiHow) using sentence-transformers and cosine similarity scores. We show that 1) sentence-transformers combined with cosine similarity calculations successfully identify similar documents in both Geograph and WikiHow effectively opening the door to the creation of new landscape specific corpora, 2) the proposed sentence-transformer approach outperforms traditional Term Frequency - Inverse Document Frequency based approaches and 3) the identified documents capture similar topics when compared to the original high quality collection. The presented workflow is transferable to various scientific disciplines in need of domain specific natural language corpora as underlying data

    Community Segmentation and Inclusive Social Media Listening

    Get PDF
    Social media analytics provide a generalized picture of situational awareness from the conversations happening among communities present in social media channels that are that are, or risk being affected by crises. The generalized nature of results from these analytics leaves underrepresented communities in the background. When considering social media analytics, concerns, sentiment, and needs are perceived as homogenous. However, offline, the community is diverse, often segmented by age group, occupation, or language, to name a few. Through our analysis of interviews from professionals using social media as a source of information in public service organizations, we argue that practitioners might not be perceiving this segmentation from the social media conversation. In addition, practitioners who are aware of this limitation, agree that there is room for improvement and resort to alternative mechanisms to understand, reach, and provide services to these communities in need. Thus, we analyze current perceptions and activities around segmentation and provide suggestions that could inform the design of social media analytics tools that support inclusive public services for all, including persons with disabilities and from other disadvantaged groups.publishedVersionPaid open acces

    Transformer-Based Multi-Task Learning for Crisis Actionability Extraction

    Get PDF
    Social media has become a valuable information source for crisis informatics. While various methods were proposed to extract relevant information during a crisis, their adoption by field practitioners remains low. In recent fieldwork, actionable information was identified as the primary information need for crisis responders and a key component in bridging the significant gap in existing crisis management tools. In this paper, we proposed a Crisis Actionability Extraction System for filtering, classification, phrase extraction, severity estimation, localization, and aggregation of actionable information altogether. We examined the effectiveness of transformer-based LSTM-CRF architecture in Twitter-related sequence tagging tasks and simultaneously extracted actionable information such as situational details and crisis impact via Multi-Task Learning. We demonstrated the system’s practical value in a case study of a real-world crisis and showed its effectiveness in aiding crisis responders with making well-informed decisions, mitigating risks, and navigating the complexities of the crisis

    Advanced analytical methods for fraud detection: a systematic literature review

    Get PDF
    The developments of the digital era demand new ways of producing goods and rendering services. This fast-paced evolution in the companies implies a new approach from the auditors, who must keep up with the constant transformation. With the dynamic dimensions of data, it is important to seize the opportunity to add value to the companies. The need to apply more robust methods to detect fraud is evident. In this thesis the use of advanced analytical methods for fraud detection will be investigated, through the analysis of the existent literature on this topic. Both a systematic review of the literature and a bibliometric approach will be applied to the most appropriate database to measure the scientific production and current trends. This study intends to contribute to the academic research that have been conducted, in order to centralize the existing information on this topic

    Análisis de Sentimientos del proceso de vacunación en España durante las diferentes olas del COVID-19

    Get PDF
    La pandemia del COVID-19 ha supuesto un impacto económico y social sin precedentes, implicando un retroceso en todos los ámbitos de la sociedad española desde marzo de 2020 hasta la actualidad. Las emociones experimentadas por los usuarios españoles compartidas en Twitter sobre el proceso de vacunación durante cada una de las olas vividas, se han visto reflejadas en este trabajo a través de un análisis de sentimientos. Conuna muestra de 101.286 tuits, se ha realizado un estudio cuantitativo extrayendoinformación subjetiva a partir de un examen de la polaridad emocional, distinguiendo las connotaciones del lenguaje utilizado entre positivas y negativas. Una vez clasificadas estas sensaciones, se han comparado durante las diferentes olas observando en gráficos, cómo ha afectado la pandemia a los pensamientos y actitudes de la población. En concreto, se ha analizado la evolución de las emociones más representativas, concluyendoque los sentimientos positivos han predominado frente a los negativos, debido a la propuesta de desescalada por parte del gobierno, el inicio de la estrategia de vacunación y su posterior ampliación. Este estudio puede ayudar a evaluar el estado de ánimo de los ciudadanos y calibrar el apoyo social a las políticas orientadas a mitigar los efectos de la pandemia.<br /

    A multi-modal approach towards mining social media data during natural disasters - A case study of Hurricane Irma

    Get PDF
    Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784 K geolocated messages) from 16,598 users from Sept. 10–12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research

    Linking social media, medical literature, and clinical notes using deep learning.

    Get PDF
    Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts
    corecore