110 research outputs found
Finding Street Gang Members on Twitter
Most street gang members use Twitter to intimidate others, to present
outrageous images and statements to the world, and to share recent illegal
activities. Their tweets may thus be useful to law enforcement agencies to
discover clues about recent crimes or to anticipate ones that may occur.
Finding these posts, however, requires a method to discover gang member Twitter
profiles. This is a challenging task since gang members represent a very small
population of the 320 million Twitter users. This paper studies the problem of
automatically finding gang members on Twitter. It outlines a process to curate
one of the largest sets of verifiable gang member profiles that have ever been
studied. A review of these profiles establishes differences in the language,
images, YouTube links, and emojis gang members use compared to the rest of the
Twitter population. Features from this review are used to train a series of
supervised classifiers. Our classifier achieves a promising F1 score with a low
false positive rate.Comment: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2016
A Semantics-Based Measure of Emoji Similarity
Emoji have grown to become one of the most important forms of communication
on the web. With its widespread use, measuring the similarity of emoji has
become an important problem for contemporary text processing since it lies at
the heart of sentiment analysis, search, and interface design tasks. This paper
presents a comprehensive analysis of the semantic similarity of emoji through
embedding models that are learned over machine-readable emoji meanings in the
EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji
sense definitions, and with different training corpora obtained from Twitter
and Google News, we develop and test multiple embedding models to measure emoji
similarity. To evaluate our work, we create a new dataset called EmoSim508,
which assigns human-annotated semantic similarity scores to a set of 508
carefully selected emoji pairs. After validation with EmoSim508, we present a
real-world use-case of our emoji embedding models using a sentiment analysis
task and show that our models outperform the previous best-performing emoji
embedding model on this task. The EmoSim508 dataset and our emoji embedding
models are publicly released with this paper and can be downloaded from
http://emojinet.knoesis.org/.Comment: This paper is accepted at Web Intelligence 2017 as a full paper, In
2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). Leipzig,
Germany: ACM, 201
Predictive Analysis on Twitter: Techniques and Applications
Predictive analysis of social media data has attracted considerable attention
from the research community as well as the business world because of the
essential and actionable information it can provide. Over the years, extensive
experimentation and analysis for insights have been carried out using Twitter
data in various domains such as healthcare, public health, politics, social
sciences, and demographics. In this chapter, we discuss techniques, approaches
and state-of-the-art applications of predictive analysis of Twitter data.
Specifically, we present fine-grained analysis involving aspects such as
sentiment, emotion, and the use of domain knowledge in the coarse-grained
analysis of Twitter data for making decisions and taking actions, and relate a
few success stories
Examining UK drill music through sentiment trajectory analysis
This paper presents how techniques from natural language processing can be
used to examine the sentiment trajectories of gang-related drill music in the
United Kingdom (UK). This work is important because key public figures are
loosely making controversial linkages between drill music and recent
escalations in youth violence in London. Thus, this paper examines the dynamic
use of sentiment in gang-related drill music lyrics. The findings suggest two
distinct sentiment use patterns and statistical analyses revealed that lyrics
with a markedly positive tone attract more views and engagement on YouTube than
negative ones. Our work provides the first empirical insights into the language
use of London drill music, and it can, therefore, be used in future studies and
by policymakers to help understand the alleged drill-gang nexus
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Using Artificial Intelligence to Identify Perpetrators of Technology Facilitated Coercive Control
This study is one of the 21 projects funded by the Home Office for research on perpetrators of domestic abuse. It is interested in a specific form of domestic abuse known as Technology Facilitated Coercive Control (TFCC) and focussed on the digital communication between (alleged) perpetrators and victim/survivors held on mobile phones. The purpose of this feasibility study was twofold,
i. to test the viability of an Artificial Intelligence (AI) programme to identify () perpetrators (including alleged perpetrators) of domestic abuse using digital communications held on mobile phones
ii. to examine police and victim/survivor attitudes towards using AI in police investigations.
Using digital conversations extracted from court transcriptions where TFCC was identified as a factor in the offending, the research team tested data sets built on different methods and techniques of AI. Natural Language Processing (NLP) tools, a subfield of AI, were also tested for their speed and accuracy in recognising abusive communication and identifying and risk assessing perpetrators of TFCC.
Conscious of national concern about policing practices relating to Violence Against Women and Girls and that any AI programme would be futile without the co-operation of both the police and the public, two online surveys were devised to measure opinion. The first sought insight into the attitudes of victim/survivors, viewed as experts in domestic abuse, about using AI in police investigations. The second involved the police and questioned their views of using AI in this way
Using Artificial Intelligence to Identify Perpetrators of Technology Facilitated Coercive Control.
This study investigated the feasibility of using Artificial Intelligence to identify perpetrators of coercive control through digital data held on mobile phones. The research also sought the views of the police and victim/survivors of domestic abuse to using technology in this way
Automated Detection of Sockpuppet Accounts in Wikipedia
Wikipedia is a free Internet-based encyclopedia that is built and maintained via the open-source collaboration of a community of volunteers. Wikipedia’s purpose is to benefit readers by acting as a widely accessible and free encyclopedia, a comprehensive written synopsis that contains information on all discovered branches of knowledge. The website has millions of pages that are maintained by thousands of volunteer editors. Unfortunately, given its open-editing format, Wikipedia is highly vulnerable to malicious activity, including vandalism, spam, undisclosed paid editing, etc.
Malicious users often use sockpuppet accounts to circumvent a block or a ban imposed by Wikipedia administrators on the person’s original account. A sockpuppet is an “online identity used for the purpose of deception.” Usually, several sockpuppet accounts are controlled by a unique individual (or entity) called a puppetmaster. Currently, suspected sockpuppet accounts are manually verified by Wikipedia administrators, which makes the process slow and inefficient.
The primary objective of this research is to develop an automated ML and neural-network-based system to recognize the patterns of sockpuppet accounts as early as possible and recommend suspension. We address the problem as a binary classification task and propose a set of new features to capture suspicious behavior that considers user activity and analyzes the contributed content. To comply with this work, we have focused on account-based and content-based features. Our solution was bifurcated into developing a strategy to automatically detect and categorize suspicious edits made by the same author from multiple accounts. We hypothesize that “you can hide behind the screen, but your personality can’t hide.” In addition to the above-mentioned method, we have also encountered the sequential nature of the work. Therefore, we have extended our analysis with a Long Short Term Memory (LSTM) model to track down the sequential pattern of users’ writing styles.
Throughout the research, we strive to automate the sockpuppet account detection system and develop tools to help the Wikipedia administration maintain the quality of articles. We tested our system on a dataset we built containing 17K accounts validated as sockpuppets. Experimental results show that our approach achieves an F1 score of 0.82 and outperforms other systems proposed in the literature. We plan to deliver our research to the Wikipedia authorities to integrate it into their existing system
AI for social good: social media mining of migration discourse
The number of international migrants has steadily increased over the years, and it has become one of the pressing issues in today’s globalized world. Our bibliometric review of around 400 articles on Scopus platform indicates an increased interest in migration-related research in recent times but the extant research is scattered at best. AI-based opinion mining research has predominantly noted negative sentiments across various social media platforms. Additionally, we note that prior studies have mostly considered social media data in the context of a particular event or a specific context. These studies offered a nuanced view of the societal opinions regarding that specific event, but this approach might miss the forest for the trees. Hence, this dissertation makes an attempt to go beyond simplistic opinion mining to identify various latent themes of migrant-related social media discourse.
The first essay draws insights from the social psychology literature to investigate two facets of Twitter discourse, i.e., perceptions about migrants and behaviors toward migrants. We identified two prevailing perceptions (i.e., sympathy and antipathy) and two dominant behaviors (i.e., solidarity and animosity) of social media users toward migrants. Additionally, this essay has also fine-tuned the binary hate speech detection task, specifically in the context of migrants, by highlighting the granular differences between the perceptual and behavioral aspects of hate speech.
The second essay investigates the journey of migrants or refugees from their home to the host country. We draw insights from Gennep's seminal book, i.e., Les Rites de Passage, to identify four phases of their journey: Arrival of Refugees, Temporal stay at Asylums, Rehabilitation, and Integration of Refugees into the host nation. We consider multimodal tweets for this essay. We find that our proposed theoretical framework was relevant for the 2022 Ukrainian refugee crisis – as a use-case.
Our third essay points out that a limited sample of annotated data does not provide insights regarding the prevailing societal-level opinions. Hence, this essay employs unsupervised approaches on large-scale societal datasets to explore the prevailing societal-level sentiments on YouTube platform. Specifically, it probes whether negative comments about migrants get endorsed by other users. If yes, does it depend on who the migrants are – especially if they are cultural others? To address these questions, we consider two datasets: YouTube comments before the 2022 Ukrainian refugee crisis, and during the crisis. Second dataset confirms the Cultural Us hypothesis, and our findings are inconclusive for the first dataset.
Our final or fourth essay probes social integration of migrants. The first part of this essay probed the unheard and faint voices of migrants to understand their struggle to settle down in the host economy. The second part of this chapter explored the viability of social media platforms as a viable alternative to expensive commercial job portals for vulnerable migrants.
Finally, in our concluding chapter, we elucidated the potential of explainable AI, and briefly pointed out the inherent biases of transformer-based models in the context of migrant-related discourse. To sum up, the importance of migration was recognized as one of the essential topics in the United Nation’s Sustainable Development Goals (SDGs). Thus, this dissertation has attempted to make an incremental contribution to the AI for Social Good discourse
- …