110 research outputs found

    How did the discussion go: Discourse act classification in social media conversations

    Full text link
    We propose a novel attention based hierarchical LSTM model to classify discourse act sequences in social media conversations, aimed at mining data from online discussion using textual meanings beyond sentence level. The very uniqueness of the task is the complete categorization of possible pragmatic roles in informal textual discussions, contrary to extraction of question-answers, stance detection or sarcasm identification which are very much role specific tasks. Early attempt was made on a Reddit discussion dataset. We train our model on the same data, and present test results on two different datasets, one from Reddit and one from Facebook. Our proposed model outperformed the previous one in terms of domain independence; without using platform-dependent structural features, our hierarchical LSTM with word relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively to predict discourse roles of comments in Reddit and Facebook discussions. Efficiency of recurrent and convolutional architectures in order to learn discursive representation on the same task has been presented and analyzed, with different word and comment embedding schemes. Our attention mechanism enables us to inquire into relevance ordering of text segments according to their roles in discourse. We present a human annotator experiment to unveil important observations about modeling and data annotation. Equipped with our text-based discourse identification model, we inquire into how heterogeneous non-textual features like location, time, leaning of information etc. play their roles in charaterizing online discussions on Facebook

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Rumor Stance Classification in Online Social Networks: A Survey on the State-of-the-Art, Prospects, and Future Challenges

    Full text link
    The emergence of the Internet as a ubiquitous technology has facilitated the rapid evolution of social media as the leading virtual platform for communication, content sharing, and information dissemination. In spite of revolutionizing the way news used to be delivered to people, this technology has also brought along with itself inevitable demerits. One such drawback is the spread of rumors facilitated by social media platforms which may provoke doubt and fear upon people. Therefore, the need to debunk rumors before their wide spread has become essential all the more. Over the years, many studies have been conducted to develop effective rumor verification systems. One aspect of such studies focuses on rumor stance classification, which concerns the task of utilizing users' viewpoints about a rumorous post to better predict the veracity of a rumor. Relying on users' stances in rumor verification task has gained great importance, for it has shown significant improvements in the model performances. In this paper, we conduct a comprehensive literature review on rumor stance classification in complex social networks. In particular, we present a thorough description of the approaches and mark the top performances. Moreover, we introduce multiple datasets available for this purpose and highlight their limitations. Finally, some challenges and future directions are discussed to stimulate further relevant research efforts.Comment: 13 pages, 2 figures, journa

    Towards a National Security Analysis Approach via Machine Learning and Social Media Analytics

    Get PDF
    Various severe threats at national and international level, such as health crises, radicalisation, or organised crime, have the potential of unbalancing a nation's stability. Such threats impact directly on elements linked to people's security, known in the literature as human security components. Protecting the citizens from such risks is the primary objective of the various organisations that have as their main objective the protection of the legitimacy, stability and security of the state. Given the importance of maintaining security and stability, governments across the globe have been developing a variety of strategies to diminish or negate the devastating effects of the aforementioned threats. Technological progress plays a pivotal role in the evolution of these strategies. Most recently, artificial intelligence has enabled the examination of large volumes of data and the creation of bespoke analytical tools that are able to perform complex tasks towards the analysis of multiple scenarios, tasks that would usually require significant amounts of human resources. Several research projects have already proposed and studied the use of artificial intelligence to analyse crucial problems that impact national security components, such as violence or ideology. However, the focus of all this prior research was examining isolated components. However, understanding national security issues requires studying and analysing a multitude of closely interrelated elements and constructing a holistic view of the problem. The work documented in this thesis aims at filling this gap. Its main contribution is the creation of a complete pipeline for constructing a big picture that helps understand national security problems. The proposed pipeline covers different stages and begins with the analysis of the unfolding event, which produces timely detection points that indicate that society might head toward a disruptive situation. Then, a further examination based on machine learning techniques enables the interpretation of an already confirmed crisis in terms of high-level national security concepts. Apart from using widely accepted national security theoretical constructions developed over years of social and political research, the second pillar of the approach is the modern computational paradigms, especially machine learning and its applications in natural language processing

    Transformer-Based Multi-Task Learning for Crisis Actionability Extraction

    Get PDF
    Social media has become a valuable information source for crisis informatics. While various methods were proposed to extract relevant information during a crisis, their adoption by field practitioners remains low. In recent fieldwork, actionable information was identified as the primary information need for crisis responders and a key component in bridging the significant gap in existing crisis management tools. In this paper, we proposed a Crisis Actionability Extraction System for filtering, classification, phrase extraction, severity estimation, localization, and aggregation of actionable information altogether. We examined the effectiveness of transformer-based LSTM-CRF architecture in Twitter-related sequence tagging tasks and simultaneously extracted actionable information such as situational details and crisis impact via Multi-Task Learning. We demonstrated the system’s practical value in a case study of a real-world crisis and showed its effectiveness in aiding crisis responders with making well-informed decisions, mitigating risks, and navigating the complexities of the crisis

    Profiling the news spreading barriers using news headlines

    Full text link
    News headlines can be a good data source for detecting the news spreading barriers in news media, which may be useful in many real-world applications. In this paper, we utilize semantic knowledge through the inference-based model COMET and sentiments of news headlines for barrier classification. We consider five barriers including cultural, economic, political, linguistic, and geographical, and different types of news headlines including health, sports, science, recreation, games, homes, society, shopping, computers, and business. To that end, we collect and label the news headlines automatically for the barriers using the metadata of news publishers. Then, we utilize the extracted commonsense inferences and sentiments as features to detect the news spreading barriers. We compare our approach to the classical text classification methods, deep learning, and transformer-based methods. The results show that the proposed approach using inferences-based semantic knowledge and sentiment offers better performance than the usual (the average F1-score of the ten categories improves from 0.41, 0.39, 0.59, and 0.59 to 0.47, 0.55, 0.70, and 0.76 for the cultural, economic, political, and geographical respectively) for classifying the news-spreading barriers.Comment: arXiv admin note: substantial text overlap with arXiv:2304.0816

    Context-Aware Message-Level Rumour Detection with Weak Supervision

    Get PDF
    Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced
    • …
    corecore