3 research outputs found

    Probabilistic Modeling of Rumour Stance and Popularity in Social Media

    Get PDF
    Social media tends to be rife with rumours when new reports are released piecemeal during breaking news events. One can mine multiple reactions expressed by social media users in those situations, exploring users’ stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. Moreover, rumours in social media exhibit complex temporal patterns. Some rumours are discussed with an increasing number of tweets per unit of time whereas other rumours fail to gain ground. This thesis develops probabilistic models of rumours in social media driven by two applications: rumour stance classification and modeling temporal dynamics of rumours. Rumour stance classification is the task of classifying the stance expressed in an individual tweet towards a rumour. Modeling temporal dynamics of rumours is an application where rumour prevalence is modeled over time. Both applications provide insights into how a rumour attracts attention from the social media community. These can assist journalists with their work on rumour tracking and debunking, and can be used in downstream applications such as systems for rumour veracity classification. In this thesis, we develop models based on probabilistic approaches. We motivate Gaussian processes and point processes as appropriate tools and show how features not considered in previous work can be included. We show that for both applications, transfer learning approaches are successful, supporting the hypothesis that there is a common underlying signal across different rumours. We furthermore introduce novel machine learning techniques which have the potential to be used in other applications: convolution kernels for streams of text over continuous time and a sequence classification algorithm based on point processes

    Analysing social media data using sentiment analysis in relation to public order

    Get PDF
    The research aim is to analyse social media data using sentiment analysis in relation to public order. A sentiment can be expressed in a thought, opinion or attitude that is mainly based on emotion instead of reason. (SA) Sentiment Analysis studies the opinions, sentiments and emotions expressed at sentence or document level. SA extracts text which is identified and classified as opinions or emotions that aim to support a decision-making process through the analysis of text. SA identifies and measures whether the text being analysed is positive, negative or neutral in relation to an entity, such as people, organisation, event, location, or a topic. As the adoption of ubiquitous technology increases and the population on social media continues to grow with the speed of responsiveness of the users expressing their political, economic or religious views on Twitter or Facebook, the posts become valuable sources of public opinion. This can be seen as an important commodity to be used to infer public opinions for social studies or marketing. The research suggests the police have found it difficult to adapt their existing model to the changing nature of public events and handling of acceleration towards technology and social media. The scalability and volume of data has made it increasingly hard for the police to manage, monitor and make use of intelligence emerging from social media to maintain the peace. To address this gap, the investigation will evaluate whether SA can enhance the analysis of social media in the context of public (dis)order events. This may help to improve the police’s decision-making process and reduce complexity to increase public safety. There are specific and generalised ways that SA can support the police, but this research might focus on a specific case. To meet the aim, the research proposes to use a SA model, data mining tools and techniques to analyse the relevant data extracted from social media. The project will use an adapted social media lifecycle as a methodological approach. Past events involving public order and the police will be evaluated to develop relevant methodology and provide appropriate recommendations to the technical community on ways to use SA for future applications of social media. In the project it adopted a hybrid approach which consists of a dictionary, machine learning and gold standard approaches. As result, the machine learning of dictionaries and manual classification results proved to show the strongest output based on precision, recall and F1 measure when compared to the machine learning of tweets and manual classification. The change point analysis helped to identify significant points in the timeline of tweets for the event which correlated to the physical event. However, there were some inaccuracies on the allocated points of change, as deemed insignificant based on news media and low volume of tweets. Future work is required to understand the reasons behind the allocation change points and possible use of alternative methods to help extract further insights that could not be explored in this project. The study makes a series of contributions to knowledge. First, to the creation of a keywords for public order events due to none being publicly available. Second, is to build towards a model to predict what may happen in public order events with the application of dictionary, machine learning and creation of gold standard in the realm of sentiment analysis. Third, the technical contribution to sentiment analysis community to help provide future recommendations to potentially enhance their framework and what areas require further research in the area. Fourth, is the development of social media lifecycle methodology, which has been tested in this project

    Temporal models of streaming social media data

    Get PDF
    There are significant temporal dependencies between online behaviour and occurring real world activities. Particularly in text modelling, these are usually ignored or at best dealt with in overly simplistic ways such as assuming smooth variation with time. Social media is a new data source which present collective behaviour much more richly than traditional sources, such as newswire, with a finer time granularity, timely reflection of activities, multiple modalities and large volume. Analysing temporal patterns in this data is important in order to discover newly emerging topics, periodic occurrences and correlation or causality to real world indicators or human behaviour patterns. With these opportunities come many challenges, both engineering (i.e.\ data volume and processing) and algorithmic, namely the inconsistency and short length of the messages and the presence of large amounts of irrelevant messages to our goal. Equipped with a better understanding of the dynamics of the complex temporal dependencies, tasks such as classification can be augmented to provide temporally aware responses. In this thesis we model the temporal dynamics of social media data. We first show that temporality is an important characteristic of this type of data. Further comparisons and correlation to real world indicators show that this data gives a timely reflection of real world events. Our goal is to use these variations to discover emerging or recurring user behaviours. We consider both the use of words and user behaviour in social media. With these goals in mind, we adapt existing and build novel machine learning techniques. These span a wide range of models: from Markov models to regularised regression models and from evolutionary spectral clustering which models smooth temporal variation to Gaussian Process regression which can identify more complex temporal patterns. We introduce approaches which discover and predict words, topics or behaviours that change over time or occur with some regularity. These are modeled for the first time in the NLP literature by using Gaussian Processes. We demonstrate that we can effectively pick out patterns, including periodicities, and achieve state-of-the-art forecasting results. We show that this performance gain transfers to improve tasks which do not take temporal information in account. Further analysed is how temporal variation in the text can be used to discover and track new content. We develop a model that exploits the variation in word co-occurrences for clustering over time. Different collection and processing tools, as well as several datasets of social media data have been developed and published as open-source software. The thesis posits that temporal analysis of data, from social media in particular, provides us with insights into real-world dynamics. Incorporating this temporal information into other applications can benefit standard tasks in natural language processing and beyond
    corecore