179 research outputs found


    Get PDF

    Social media analytics with applications in disaster management and COVID-19 events

    Get PDF
    Social media such as Twitter offers a tremendous amount of data throughout an event or a disastrous situation. Leveraging social media data during a disaster is beneficial for effective and efficient disaster management. Information extraction, trend identification, and determining public reactions might help in the future disaster or even avert such an event. However, during a disaster situation, a robust system is required that can be deployed faster and process relevant information with satisfactory performance in real-time. This work outlines the research contributions toward developing such an effective system for disaster management, where it is paramount to develop automated machine-enabled methods that can provide appropriate tags or labels for further analysis for timely situation-awareness. In that direction, this work proposes machine learning models to identify the people who are seeking assistance using social media during a disaster and further demonstrates a prototype application that can collect and process Twitter data in real-time, identify the stranded people, and create rescue scheduling. In addition, to understand the people’s reactions to different trending topics, this work proposes a unique auxiliary feature-based deep learning model with adversarial sample generation for emotion detection using tweets related to COVID-19. This work also presents a custom Q&A-based RoBERTa model for extracting related phrases for emotions. Finally, with the aim of polarization detection, this research work proposes a deep learning pipeline for political ideology detection leveraging the tweet texts and the expressed emotions in the text. This work also studies and conducts the historical emotion and polarization analysis of the COVID-19 pandemic in the USA and several individual states using tweeter data --Abstract, page iv

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Tracking public opinion on social media

    Get PDF
    The increasing popularity of social media has changed the web from a static repository of information into a dynamic forum with continuously changing information. Social media platforms has given the capability to people expressing and sharing their thoughts and opinions on the web in a very simple way. The so-called User Generated Content is a good source of users opinion and mining it can be very useful for a wide variety of applications that require understanding the public opinion about a concept. For example, enterprises can capture the negative or positive opinions of customers about their services or products and improve their quality accordingly. The dynamic nature of social media with the constantly changing vocabulary, makes developing tools that can automatically track public opinion a challenge. To help users better understand public opinion towards an entity or a topic, it is important to: a) find the related documents and the sentiment polarity expressed in them; b) identify the important time intervals where there is a change in the opinion; c) identify the causes of the opinion change; d) estimate the number of people that have a certain opinion about the entity; and e) measure the impact of public opinion towards the entity. In this thesis we focus on the problem of tracking public opinion on social media and we propose and develop methods to address the different subproblems. First, we analyse the topical distribution of tweets to determine the number of topics that are discussed in a single tweet. Next, we propose a topic specific stylistic method to retrieve tweets that are relevant to a topic and also express opinion about it. Then, we explore the effectiveness of time series methodologies to track and forecast the evolution of sentiment towards a specific topic over time. In addition, we propose the LDA & KL-divergence approach to extract and rank the likely causes of sentiment spikes. We create a test collection that can be used to evaluate methodologies in ranking the likely reasons of sentiment spikes. To estimate the number of people that have a certain opinion about an entity, we propose an approach that uses pre-publication and post- publication features extracted from news posts and users' comments respectively. Finally, we propose an approach that propagates sentiment signals to measure the impact of public opinion towards the entity's reputation. We evaluate our proposed methods on standard evaluation collections and provide evidence that the proposed methods improve the performance of the state-of-the-art approaches on tracking public opinion on social media

    Assessing Trust and Veracity of Data in Social Media

    Get PDF
    Social media highly impacts our knowledge and perception of the world. With the tremendous amount of data that is circulating in social media and initiated by a vast number of users from all over the world, extracting useful information from such data and assessing its veracity has become much more challenging. Data veracity refers to the trustworthiness and certainty of data. The challenges of handling textual data in social media have raised the need for efficient tools to extract, understand, and assess the veracity of information circulating in social media at a given time. In this thesis, we present three research problems to address major challenges of handling textual data in social media. First, overwhelming the user with huge volumes of short, noisy, and unstructured textual data complicates the task of understanding what topics are discussed by users in micro-blogging websites. Topic models were proposed to automatically learn a set of keywords that better describe each topic covered by a large corpus of text documents to enable fast and effective browsing and exploration of its contents. However, in order for the results of topic modeling algorithms to be useful, these results have to be interpretable. Applying topic models to social media data to get meaningful results is not a trivial task. In this thesis, we study the problem of improving interpretation of topic modeling of micro-posts in social media. We propose a new method that incorporates topic modeling, a lexical database, and the set of hashtags available in the corpus of micro-posts to produce a higher quality representation of each extracted topic. Extensive experiments on two real-life datasets collected from Twitter show that our method outperforms the state-of-the-art model in terms of perplexity, topics' coherence, and their quality. Second, the nature and flexibility of social media facilitate the process of posting unverified information, especially during the rapid diffusion of breaking news. Efficiently detecting and acting upon unverified breaking news rumors throughout social media is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this thesis, we study the problem of assessing the veracity of information contained in micro-posts regarding emerging stories and topics of breaking news. We propose a new approach that jointly learns word embeddings and trains a neural network model with two different objectives to automatically identify unverified micro-posts spreading in social media during breaking news. Extensive experiments on real-life datasets show that our proposed model outperforms the state-of-the-art classifier as well as other baseline classifiers in terms of precision, recall, and F1. Finally, the uncertainty and chaos associated with hot and sensitive breaking news and emergencies facilitate the explosive spread of high-engaging breaking news rumors that might be extremely damaging. In such a case, authorities have to prioritize the rumors verification process and act upon high-engaging breaking news rumors quickly to reduce their damaging consequences. However, this is an extremely challenging task. In this thesis, we study the problem of identifying rumors micro-posts that are most likely to become viral and achieve high engagement rates among recipients in social media during breaking news. We propose a multi-task neural network to jointly learn the two tasks of breaking news rumors detection and breaking news rumors popularity prediction. Extensive experiments on real-life datasets show that the performance of our joint learning model outperforms other baseline classifiers in terms of precision, recall, and F1 and is capable of identifying high-engaging breaking news rumors with high accuracy