6,349 research outputs found

    Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media

    Get PDF
    When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models

    Classifying Crises-Information Relevancy with Semantics

    Get PDF
    Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis

    Real-Time Classification of Twitter Trends

    Get PDF
    Social media users give rise to social trends as they share about common interests, which can be triggered by different reasons. In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following four types: 'news', 'ongoing events', 'memes', and 'commemoratives'. While previous research has analyzed trending topics in a long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This would allow to provide a filtered subset of trends to end users. We analyze and experiment with a set of straightforward language-independent features based on the social spread of trends to categorize them into the introduced typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might enrich marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.Comment: Pre-print of article accepted for publication in Journal of the American Society for Information Science and Technology copyright @ 2013 (American Society for Information Science and Technology

    Cross-Lingual Classification of Crisis Data

    Get PDF
    Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current methods for classifying the relevance of posts to a crisis or set of crises typically struggle to deal with posts in different languages, and it is not viable during rapidly evolving crisis situations to train new models for each language. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases improve accuracy over a purely statistical model

    Organized Behavior Classification of Tweet Sets using Supervised Learning Methods

    Full text link
    During the 2016 US elections Twitter experienced unprecedented levels of propaganda and fake news through the collaboration of bots and hired persons, the ramifications of which are still being debated. This work proposes an approach to identify the presence of organized behavior in tweets. The Random Forest, Support Vector Machine, and Logistic Regression algorithms are each used to train a model with a data set of 850 records consisting of 299 features extracted from tweets gathered during the 2016 US presidential election. The features represent user and temporal synchronization characteristics to capture coordinated behavior. These models are trained to classify tweet sets among the categories: organic vs organized, political vs non-political, and pro-Trump vs pro-Hillary vs neither. The random forest algorithm performs better with greater than 95% average accuracy and f-measure scores for each category. The most valuable features for classification are identified as user based features, with media use and marking tweets as favorite to be the most dominant.Comment: 51 pages, 5 figure

    Analyzing Disproportionate Reaction via Comparative Multilingual Targeted Sentiment in Twitter

    Get PDF
    Global events such as terrorist attacks are commented upon in social media, such as Twitter, in different languages and from different parts of the world. Most prior studies have focused on monolingual sentiment analysis, and therefore excluded an extensive proportion of the Twitter userbase. In this paper, we perform a multilingual comparative sentiment analysis study on the terrorist attack in Paris, during November 2015. In particular, we look at targeted sentiment, investigating opinions on specific entities, not simply the general sentiment of each tweet. Given the potentially inflammatory and polarizing effect that these types of tweets may have on attitudes, we examine the sentiments expressed about different targets and explore whether disproportionate reaction was expressed about such targets across different languages. Specifically, we assess whether the sentiment for French speaking Twitter users during the Paris attack differs from English-speaking ones. We identify disproportionately negative attitudes in the English dataset over the French one towards some entities and, via a crowdsourcing experiment, illustrate that this also extends to forming an annotator bias
    corecore