214 research outputs found
Stochastic Sampling and Machine Learning Techniques for Social Media State Production
The rise in the importance of social media platforms as communication tools has been both a blessing and a curse. For scientists, they offer an unparalleled opportunity to study human social networks. However, these platforms have also been used to propagate misinformation and hate speech with alarming velocity and frequency. The overarching aim of our research is to leverage the data from social media platforms to create and evaluate a high-fidelity, at-scale computational simulation of online social behavior which can provide a deep quantitative understanding of adversaries\u27 use of the global information environment. Our hope is that this type of simulation can be used to predict and understand the spread of misinformation, false narratives, fraudulent financial pump and dump schemes, and cybersecurity threats. To do this, our research team has created an agent-based model that can handle a variety of prediction tasks. This dissertation introduces a set of sampling and deep learning techniques that we developed to predict specific aspects of the evolution of online social networks that have proven to be challenging to accurately predict with the agent-based model. First, we compare different strategies for predicting network evolution with sampled historical data based on community features. We demonstrate that our community-based model outperforms the global one at predicting population, user, and content activity, along with network topology over different datasets. Second, we introduce a deep learning model for burst prediction. Bursts may serve as a signal of topics that are of growing real-world interest. Since bursts can be caused by exogenous phenomena and are indicative of burgeoning popularity, leveraging cross-platform social media data is valuable for predicting bursts within a single social media platform. An LSTM model is proposed in order to capture the temporal dependencies and associations based upon activity information. These volume predictions can also serve as a valuable input for our agent-based model. Finally, we conduct an exploration of Graph Convolutional Networks to investigate the value of weak-ties in classifying academic literature with the use of graph convolutional neural networks. Our experiments look at the results of treating weak-ties as if they were strong-ties to determine if that assumption improves performance. We also examine how node removal affects prediction accuracy by selecting nodes according to different centrality measures. These experiments provide insight for which nodes are most important for the performance of targeted graph convolutional networks. Graph Convolutional Networks are important in the social network context as the sociological and anthropological concept of \u27homophily\u27 allows for the method to use network associations in assisting the attribute predictions in a social network
Analyzing Granger causality in climate data with time series classification methods
Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
Context-Aware Message-Level Rumour Detection with Weak Supervision
Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced
Automated anomaly recognition in real time data streams for oil and gas industry.
There is a growing demand for computer-assisted real-time anomaly detection - from the identification of suspicious activities in cyber security, to the monitoring of engineering data for various applications across the oil and gas, automotive and other engineering industries. To reduce the reliance on field experts' knowledge for identification of these anomalies, this thesis proposes a deep-learning anomaly-detection framework that can help to create an effective real-time condition-monitoring framework. The aim of this research is to develop a real-time and re-trainable generic anomaly-detection framework, which is capable of predicting and identifying anomalies with a high level of accuracy - even when a specific anomalous event has no precedent. Machine-based condition monitoring is preferable in many practical situations where fast data analysis is required, and where there are harsh climates or otherwise life-threatening environments. For example, automated conditional monitoring systems are ideal in deep sea exploration studies, offshore installations and space exploration. This thesis firstly reviews studies about anomaly detection using machine learning. It then adopts the best practices from those studies in order to propose a multi-tiered framework for anomaly detection with heterogeneous input sources, which can deal with unseen anomalies in a real-time dynamic problem environment. The thesis then applies the developed generic multi-tiered framework to two fields of engineering: data analysis and malicious cyber attack detection. Finally, the framework is further refined based on the outcomes of those case studies and is used to develop a secure cross-platform API, capable of re-training and data classification on a real-time data feed
Recommended from our members
Moment-to-moment mood change modelling in mobile mental health network
Human interests and behaviour change over time and often affected by multiple factors. In particular, human emotions, mood and its constituent processes change and interact over time. Therefore, modelling human behaviour should take into account the changes over time for customization and adaptation of systems to the users’ specific needs. Understanding and assessing the temporal dynamics of mood are critical for modelling human behaviour for both individuals and group of people who share similar habits, life style and personal circumstances. Thus, in order to construct a personalized recommendation for a given user, it is first necessary to have some knowledge about previous user interests and behaviours. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored: How do emotions vary across individuals, evolve over time, and are connected to social ties? We address these questions using a large-scale dataset of users that contains both their users’ interactions with momentary emotions and topical labels. Using this dataset, we identify patterns of human emotions on different levels, starting from the network level, group-level (cluster) and moving towards the user level. At the user-level, we identify how human emotions are distributed and vary over time. In particular, we model changes in mood using multi-level multimodal features including users’ sentimental status, engagement and linguistic queries. We also utilise language models to model and understand patterns of mood change. We model the changes of users’ mental states based on replies and responses to posts over time and predict future states. We find that the future mental states can be predicted with reasonable accuracy given users’ historical posts, current participation features. Our findings form a step forward towards better understand the interplay between user behaviour and mood change exhibited while interacting on mental health network and providing some interpretable summaries that can be used in the future by health experts and individuals and work on possible medical interventions together with clinical experts
- …