Search CORE

32 research outputs found

Online Sexual Predator Detection

Author: Khalid Muhammad
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2023
Field of study

Online sexual abuse is a concerning yet severely overlooked vice of modern society. With more children being on the Internet and with the ever-increasing advent of web-applications such as online chatrooms and multiplayer games, preying on vulnerable users has become more accessible for predators. In recent years, there has been work on detecting online sexual predators using Machine Learning and deep learning techniques. Such work has trained on severely imbalanced datasets, and imbalance is handled via manual trimming of over-represented labels. In this work, we propose an approach that first tackles the problem of imbalance and then improves the effectiveness of the underlying classifiers. Our evaluation of the proposed sampling approach on PAN benchmark dataset shows performance improvements on several classification metrics, compared to prior methods that otherwise require hands-crafted sampling of the data

Scholarship at UWindsor

A survey on Datamining in Cyber Bullying

Author: K. Nalini, Dr. L. Jaba Sheela
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2014
Field of study

No Abstrac

International Journal on Recent and Innovation Trends in Computing and Communication

Characterizing Pedophile Conversations on the Internet using Online Grooming

Author: Gupta Aditi
Kumaraguru Ponnurangam
Sureka Ashish
Publication venue
Publication date: 01/01/2012
Field of study

Cyber-crime targeting children such as online pedophile activity are a major and a growing concern to society. A deep understanding of predatory chat conversations on the Internet has implications in designing effective solutions to automatically identify malicious conversations from regular conversations. We believe that a deeper understanding of the pedophile conversation can result in more sophisticated and robust surveillance systems than majority of the current systems relying only on shallow processing such as simple word-counting or key-word spotting. In this paper, we study pedophile conversations from the perspective of online grooming theory and perform a series of linguistic-based empirical analysis on several pedophile chat conversations to gain useful insights and patterns. We manually annotated 75 pedophile chat conversations with six stages of online grooming and test several hypothesis on it. The results of our experiments reveal that relationship forming is the most dominant online grooming stage in contrast to the sexual stage. We use a widely used word-counting program (LIWC) to create psycho-linguistic profiles for each of the six online grooming stages to discover interesting textual patterns useful to improve our understanding of the online pedophile phenomenon. Furthermore, we present empirical results that throw light on various aspects of a pedophile conversation such as probability of state transitions from one stage to another, distribution of a pedophile chat conversation across various online grooming stages and correlations between pre-defined word categories and online grooming stages

arXiv.org e-Print Archive

CiteSeerX

The datafication of hate speech

Author: Laaksonen Salla-Maaria
Publication venue: Berlin
Publication date: 01/01/2023
Field of study

Hate speech has been identified as a pressing problem in society, and several automated approaches have been designed to detect and prevent it. This chapter reflects on the operationalizations, transformations, and reductions required by the datafication of hate to build such an automated system. The observations are based on an action research setting during a hate speech monitoring project conducted in a multi-organizational collaboration during the Finnish municipal elections in 2017. The project developed an adequately well-working algorithmic solution using supervised machine learning. However, the automated approach requires heavy simplification, such as using rudimentary scales for classifying hate speech and relying on word-based approaches, while in reality hate speech is a nuanced linguistic and social phenomenon with various tones and forms. The chapter concludes by suggesting some practical implications for developing hate speech recognition systems

SSOAR - Social Science Open Access Repository

Detecting psycho-anomalies on the world-wide web: current tools and challenges

Author: Domdouzis Konstantinos
Publication venue: Nova Science Publishers, Inc.
Publication date: 01/01/2019
Field of study

The rise of the use of Social Media and the overall progress of technology has unfortunately opened new ways for criminals such as paedophiles, serial killers and rapists to exploit the powers that the technology offers in order to lure potential victims. It is of great need to be able to detect extreme criminal behaviours on the World-Wide Web and take measures to protect the general public from the effects of such behaviours. The aim of this chapter is to examine the current data analysis tools and technologies that are used to detect extreme online criminal behaviour and the challenges that exist associated with the use of these technologies. Specific emphasis is given to extreme criminal behaviours such as paedophilia and serial killing as these are considered the most dangerous behaviours. A number of conclusions are drawn in relation to the use and challenges of technological means in order to face such criminal behaviours

Sheffield Hallam University Research Archive

Detecting child grooming behaviour patterns on social media

Author: C.-C. Chang
D. Bogdanova
H. Nijman
H. Whittle
J.K. Chambers
L.N. Olson
M.F. Porter
S. Argamon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Online paedophile activity in social media has become a major concern in society as Internet access is easily available to a broader younger population. One common form of online child exploitation is child grooming, where adults and minors exchange sexual text and media via social media platforms. Such behaviour involves a number of stages performed by a predator (adult) with the final goal of approaching a victim (minor) in person. This paper presents a study of such online grooming stages from a machine learning perspective. We propose to characterise such stages by a series of features covering sentiment polarity, content, and psycho-linguistic and discourse patterns. Our experiments with online chatroom conversations show good results in automatically classifying chatlines into various grooming stages. Such a deeper understanding and tracking of predatory behaviour is vital for building robust systems for detecting grooming conversations and potential predators on social media

Crossref

Open Research Online (The Open University)

Us and them: identifying cyber hate on Twitter across multiple protected characteristics

Author: Burnap Peter
Williams Matthew Leighton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Hateful and antagonistic content published and propagated via the World Wide Web has the potential to cause harm and suffering on an individual basis, and lead to social tension and disorder beyond cyber space. Despite new legislation aimed at prosecuting those who misuse new forms of communication to post threatening, harassing, or grossly offensive language - or cyber hate - and the fact large social media companies have committed to protecting their users from harm, it goes largely unpunished due to difficulties in policing online public spaces. To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability and sexual orientation. We use text parsing to extract typed dependencies, which represent syntactic and grammatical relationships between words, and are shown to capture ‘othering’ language - consistently improving machine classification for different types of cyber hate beyond the use of a Bag of Words and known hateful terms. Furthermore, we build a data-driven blended model of cyber hate to improve classification where more than one protected characteristic may be attacked ( e.g. race and sexual orientation), contributing to the nascent study of intersectionality in hate crime

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

Text Mining and Cybercrime

Author: Agatston
Boyd
Burmester
Ellison
Hartigan
Kumar
Olson
Quinlan
Riffe
Sipior
Williams
Witten
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

This chapter describes the state of technology for studying Internet crimes against children, specifically sexual predation and cyberbullying. We begin by presenting a survey of relevant research articles that are related to the study of cybercrime. This survey includes a discussion of our work on the classification of chat logs that contain bullying or predatory behavior. Many commercial enterprises have developed parental control software to monitor these behaviors, and the latest version of some of these tools provides features that profess to protect children against predators and bullies. The chapter concludes with a discussion of these products and offers suggestions for continued research in this interesting and timely sub-field of text mining. 1.

CiteSeerX

Crossref

Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making

Author: Burnap Peter
Williams Matthew Leighton
Publication venue
Publication date
Field of study

In 2013, the murder of Drummer Lee Rigby in Woolwich, UK led to an extensive public social media reaction. Given the extreme terrorist motive and public nature of the actions it was feasible that the public response could include written expressions of hateful and antagonistic sentiment towards a particular race, ethnicity and religion, which can be interpreted as ‘hate speech’. This provided motivation to study the spread of hate speech on Twitter following such a widespread and emotive event. In this paper we present a supervised machine learning text classifier, trained and tested to distinguish between hateful and/or antagonistic responses with a focus on race, ethnicity or religion; and more general responses. We used human annotated data collected from Twitter in the immediate aftermath of Lee Rigby’s murder to train and test the classifier. As “Big Data” is a growing topic of study, and its use is in policy and decision making is being constantly debated at present, we discuss the use of supervised machine learning tools to classify a sample of “Big Data”, and how the results can be interpreted for use in policy and decision making. The results of the classifier are optimal using a combination of probabilistic, rule-based and spatial based classifiers with a voted ensemble meta-classifier. We achieve an overall F-measure of 0.95 using features derived from the content of each tweet, including syntactic dependencies between terms to recognise “othering” terms, incitement to respond with antagonistic action, and claims of well founded or justified discrimination against social groups. We then demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of hate speech in a sample of Twitter data

Online Research @ Cardiff