19 research outputs found
Intelligence gathering by capturing the social processes within prisons
We present a prototype system that can be used to capture longitudinal
socialising processes by recording people's encounters in space. We argue that
such a system can usefully be deployed in prisons and other detention
facilities in order help intelligence analysts assess the behaviour or
terrorist and organised crime groups, and their potential relationships. Here
we present the results of a longitudinal study, carried out with civilians,
which demonstrates the capabilities of our system.Comment: 21 pages, 7 Figures, 1 tabl
On the use of distributed semantics of tweet metadata for user age prediction
Abstract
Social media data represent an important resource for behavioral analysis of the aging population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with baseline models, yields an improvement of up to 12.3% for Dutch dataset, 9.8% for English1 dataset, and 6.6% for English2 dataset in the micro-averaged F1 score
Inferring demographic data of marginalized users in Twitter with computer vision APIs
Abstract
Inferring demographic intelligence from unlabeled social media data is an actively growing area of research, challenged by low availability of ground truth annotated training corpora. High-accuracy approaches for labeling demographic traits of social media users employ various heuristics that do not scale up and often discount non-English texts and marginalized users. First, we present a framework for inferring the demographic attributes of Twitter users from their profile pictures (avatars) using the Microsoft Azure Face API. Second, we measure the inter-rater agreement between annotations made using our framework against two pre-labeled samples of Twitter users (N1=1163; N2=659) whose age labels were manually annotated. Our results indicate that the strength of the inter-rater agreement (Gwetâs AC1=0.89; 0.90) between the gold standard and our approach is âvery goodâ for labelling the age group of users. The paper provides a use case of Computer Vision for enabling the development of large cross-sectional labeled datasets, and further advances novel solutions in the field of demographic inference from short social media texts
Catchem:a browser plugin for the Panama papers using approximate string matching
Abstract
The Panama Papers is a collection of 11.5 million leaked records that contain information for more than 214,488 offshore entities. This collection is growing rapidly as more leaked records become available online. In this paper, we present a work in progress on a web browser plugin that detects company names from the Panama Papers and alerts the user by means of unobtrusive visual cues. We matched a random sample of company names from the Public Works and Government Services Canada registry against the Panama Papers using three different string matching techniques. Monge-Elkan is found to provide the best matching results but at increased computational cost. Levenshtein-based approach is found to provide the best tradeoff between matching and computational cost, while Jacquard index like approach is found to be less sensitive to slight textual change
Meta-terrorism:identifying linguistic patterns in public discourse after an attack
Abstract
When a terror-related event occurs, there is a surge of traffic on social media comprising of informative messages, emotional outbursts, helpful safety tips, and rumors. It is important to understand the behavior manifested on social media sites to gain a better understanding of how to govern and manage in a time of crisis. We undertook a detailed study of Twitter during two recent terror-related events: the Manchester attacks and the Las Vegas shooting. We analyze the tweets during these periods using (a) sentiment analysis, (b) topic analysis, and (c) fake news detection. Our analysis demonstrates the spectrum of emotions evinced in reaction and the way those reactions spread over the event timeline. Also, with respect to topic analysis, we find âecho chambersâ, groups of people interested in similar aspects of the event. Encouraged by our results on these two event datasets, the paper seeks to enable a holistic analysis of social media messages in a time of crisis
A novel edge architecture and solution for detecting concept drift in smart environments
Abstract
The proliferation of the Internet of Things (IoT), artificial intelligence (AI), the adoption of 5G, and progress towards 6G technology have led to the accumulation of massive amounts of real-world data; however, a significant portion of the data generated by smart cities and smart buildings remains unused. A notable problem is the shift of statistical properties in real-world streaming over time caused by unexpected factors, referred to as concept drift, which results in less efficient predictive models. To address this problem, the latest research leverages the cloudâedge continuum paradigm for the deployment of AI and general smart city applications while utilising the available resources optimally. In this article, we propose a computing architecture for different smart city applications in edge micro data centre (EMDC) settings over a hybrid cloudâedge continuum to support the deployment of AI workloads. We implement a feedback-driven automated concept drift detection and adaptation methodology, combining base learner long short-term memory (LSTM) with PageâHinkley test (PHT), adaptive windowing (ADWIN) and the KolmogorovâSmirnov windowing (KSWIN). Real-world data streams are utilised to forecast from various environmental sensors installed at the University of Oulu Smart Campus. The feedback-based concept drift detection and adaption process is first evaluated using synthetic datasets with known concept drift points and then employed in the real-world data. Subsequently, the implementation is evaluated using the state-of-the-art MAE, RMSE, and MAPE methods. The results showed a reduction in MAPE from 8.5% to 3.88% when concept drift detection was applied. Additionally, the challenges faced and the effectiveness of the suggested solutions are explored
Covert online ethnography and machine learning for detecting individuals at risk of being drawn into online sex work
Abstract
How can we identify individuals at risk of being drawn into online sex work? The spread of online communication removes transaction costs and enables a greater number of people to be involved in illicit activities, including online sex trade. As a result, social media platforms often work as springboard for criminal careers posing a significant risk to the economy, public health and trust. Detecting deviant behaviors online is limited by the poor availability of ground-truth data and machine learning tools. Unlike prior work which focuses exclusively on either qualitative or quantitative methods, in this paper we combine covert online ethnography with semi-supervised learning methodologies, using data from a popular European adult forum. We obtained risk assessment results of 78 users using covert online ethnography, and set out to build a machine learning model that can predict the risk factor in other 28,832 users. Results show that a combination-based approach in which all features are used yields the most accurate results
On the use of URLs and hashtags in age prediction of Twitter users
Abstract
Social media data represent an important resource for behavioral analysis of the ageing population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with an SVM baseline model, yields an improvement of 12.3% and 6.6% in the micro-averaged F1 score on the Dutch and English datasets, respectively
Correlating refugee border crossings with internet search data
Abstract
Can Internet search data be used as a proxy to predict refugee mobility? The soaring refugee death toll in Europe creates an urgent need for novel tools that monitor and forecast refugee flows. This study investigates the correlation between refugee mobility data and Internet search data from Google Trends. Google Trends is a freely accessible tool that provides access to Internet search data by analyzing a sample of all web queries. In our study, we surveyed refugees in Greece (entry point) and in Finland (destination point) to identify what search queries they had used during their travel. Next, we conducted time series analysis on Google search data to investigate whether interest in user-defined search queries correlated with the levels of refugee arrival data recorded by the United Nations High Commissioner for Refugees (UNHCR). Results indicate that the reuse of internet search data considerably improves the predictive power of the models