8,819 research outputs found

    Detecting child grooming behaviour patterns on social media

    Get PDF
    Online paedophile activity in social media has become a major concern in society as Internet access is easily available to a broader younger population. One common form of online child exploitation is child grooming, where adults and minors exchange sexual text and media via social media platforms. Such behaviour involves a number of stages performed by a predator (adult) with the final goal of approaching a victim (minor) in person. This paper presents a study of such online grooming stages from a machine learning perspective. We propose to characterise such stages by a series of features covering sentiment polarity, content, and psycho-linguistic and discourse patterns. Our experiments with online chatroom conversations show good results in automatically classifying chatlines into various grooming stages. Such a deeper understanding and tracking of predatory behaviour is vital for building robust systems for detecting grooming conversations and potential predators on social media

    Detecting psycho-anomalies on the world-wide web: current tools and challenges

    Get PDF
    The rise of the use of Social Media and the overall progress of technology has unfortunately opened new ways for criminals such as paedophiles, serial killers and rapists to exploit the powers that the technology offers in order to lure potential victims. It is of great need to be able to detect extreme criminal behaviours on the World-Wide Web and take measures to protect the general public from the effects of such behaviours. The aim of this chapter is to examine the current data analysis tools and technologies that are used to detect extreme online criminal behaviour and the challenges that exist associated with the use of these technologies. Specific emphasis is given to extreme criminal behaviours such as paedophilia and serial killing as these are considered the most dangerous behaviours. A number of conclusions are drawn in relation to the use and challenges of technological means in order to face such criminal behaviours

    Exploring high-level features for detecting cyberpedophilia

    Full text link
    [EN] In this paper, we suggest a list of high-level features and study their applicability in detection of cyberpedophiles. We used a corpus of chats downloaded from http://www.perverted-justice.com and two negative datasets of different nature: cybersex logs available online, and the NPS chat corpus. The classification results show that the NPS data and the pedophiles’ conversations can be accurately discriminated from each other with character n-grams, while in the more complicated case of cybersex logs there is need for high-level features to reach good accuracy levels. In this latter setting our results show that features that model behaviour and emotion significantly outperform the low-level ones, and achieve a 97% accuracy.The work of Dasha Bogdanova was partially carried out during the internship at the Universitat Politecnica de Valencia (scholarship of the University of St. Petersburg). Her research was partially supported by Google Research Award. The collaboration with Thamar Solorio was possible thanks to her one-month research visit at the Universitat Politecnica de Valencia (program PAID-PAID-02-11 award n. 1932). The research work of Paolo Rosso was done in the framework of the European Commission WIQ-EI Web Information Quality Evaluation Initiative (IRSES Grant No. 269180) project within the FP 7 Marie Curie People, the DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-0O2-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Bogdanova, D.; Rosso, P.; Solorio, T. (2014). Exploring high-level features for detecting cyberpedophilia. Computer Speech and Language. 28(1):108-120. https://doi.org/10.1016/j.csl.2013.04.007S10812028

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Statistical models for the analysis of short user-generated documents: author identification for conversational documents

    Get PDF
    In recent years short user-generated documents have been gaining popularity on the Internet and attention in the research communities. This kind of documents are generated by users of the various online services: platforms for instant messaging communication, for real-time status posting, for discussing and for writing reviews. Each of these services allows users to generate written texts with particular properties and which might require specific algorithms for being analysed. In this dissertation we are presenting our work which aims at analysing this kind of documents. We conducted qualitative and quantitative studies to identify the properties that might allow for characterising them. We compared the properties of these documents with the properties of standard documents employed in the literature, such as newspaper articles, and defined a set of characteristics that are distinctive of the documents generated online. We also observed two classes within the online user-generated documents: the conversational documents and those involving group discussions. We later focused on the class of conversational documents, that are short and spontaneous. We created a novel collection of real conversational documents retrieved online (e.g. Internet Relay Chat) and distributed it as part of an international competition (PAN @ CLEF'12). The competition was about author characterisation, which is one of the possible studies of authorship attribution documented in the literature. Another field of study is authorship identification, that became our main topic of research. We approached the authorship identification problem in its closed-class variant. For each problem we employed documents from the collection we released and from a collection of Twitter messages, as representative of conversational or short user-generated documents. We proved the unsuitability of standard authorship identification techniques for conversational documents and proposed novel methods capable of reaching better accuracy rates. As opposed to standard methods that worked well only for few authors, the proposed technique allowed for reaching significant results even for hundreds of users

    Aspects of internet security: identity management and online child protection

    Get PDF
    This thesis examines four main subjects; consumer federated Internet Identity Management (IdM), text analysis to detect grooming in Internet chat, a system for using steganographed emoticons as ‘digital fingerprints’ in instant messaging and a systems analysis of online child protection. The Internet was never designed to support an identity framework. The current username / password model does not scale well and with an ever increasing number of sites and services users are suffering from password fatigue and using insecure practises such as using the same password across websites. In addition users are supplying personal information to vast number of sites and services with little, if any control over how that information is used. A new identity metasystem promises to bring federated identity, which has found success in the enterprise to the consumer, placing the user in control and limiting the disclosure of personal information. This thesis argues though technical feasible no business model exists to support consumer IdM and without a major change in Internet culture such as a breakdown in trust and security a new identity metasystem will not be realised. Is it possible to detect grooming or potential grooming from a statistical examination of Internet chat messages? Using techniques from speaker verification can grooming relationships be detected? Can this approach improve on the leading text analysis technique – Bayesian trigram analysis? Using a novel feature extraction technique and Gaussian Mixture Models (GMM) to detect potential grooming proved to be unreliable. Even with the benefit of extensive tuning the author doubts the technique would match or improve upon Bayesian analysis. Around 80% of child grooming is blatant with the groomer disguising neither their age nor sexual intent. Experiments conducted with Bayesian trigram analysis suggest this could be reliably detected, detecting the subtle, devious remaining 20% is considerably harder and reliable detection is questionable especially in systems using teenagers (the most at risk group). Observations of the MSN Messenger service and protocol lead the author to discover a method by which to leave digitally verifiable files on the computer of anyone who chats with a child by exploiting the custom emoticon feature. By employing techniques from steganography these custom emoticons can be made to appear innocuous. Finding and removing custom emoticons is a non-trivial matter and they cannot be easily spoofed. Identification is performed by examining the emoticon (file) hashes. If an emoticon is recovered e.g. in the course of an investigation it can be hashed and the hashed compared against a database of registered users and used to support non-repudiation and confirm if an individual has indeed been chatting with a child. Online child protection has been described as a classic systems problem. It covers a broad range of complex, and sometimes difficult to research issues including technology, sociology, psychology and law, and affects directly or indirectly the majority of the UK population. Yet despite this the problem and the challenges are poorly understood, thanks in no small part to mawkish attitudes and alarmist media coverage. Here the problem is examined holistically; how children use technology, what the risks are, and how they can best be protected – based not on idealism, but on the known behaviours of children. The overall protection message is often confused and unrealistic, leaving parents and children ill prepared to protect themselves. Technology does have a place in protecting children, but this is secondary to a strong and understanding parent/child relationship and education, both of the child and parent
    • …
    corecore