325 research outputs found

    Detecting child grooming behaviour patterns on social media

    Get PDF
    Online paedophile activity in social media has become a major concern in society as Internet access is easily available to a broader younger population. One common form of online child exploitation is child grooming, where adults and minors exchange sexual text and media via social media platforms. Such behaviour involves a number of stages performed by a predator (adult) with the final goal of approaching a victim (minor) in person. This paper presents a study of such online grooming stages from a machine learning perspective. We propose to characterise such stages by a series of features covering sentiment polarity, content, and psycho-linguistic and discourse patterns. Our experiments with online chatroom conversations show good results in automatically classifying chatlines into various grooming stages. Such a deeper understanding and tracking of predatory behaviour is vital for building robust systems for detecting grooming conversations and potential predators on social media

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Overview of the Author Profiling Task at PAN 2013

    Full text link
    [EN] This overview presents the framework and results for the Author Profiling task at PAN 2013. We describe in detail the corpus and its characteristics, and the evaluation framework we used to measure the participants performance to solve the problem of identifying age and gender from anonymous texts. Finally, the approaches of the 21 participants and their results are described.The author profiling task @PAN-2013 was an activity of the WIQ-EI IRSES project (Grant No. 269180) within the FP 7 Marie Curie People Framework of the European Commission. We want to thank the Forensic Lab of the Universitat Pompeu Fabra Barcelona for sponsoring the award for the winner team. The work of the first author was partially funded by Autoritas Consulting SA and by Ministerio de Economía y Competitividad de España under grant ECOPORTUNITY IPT-2012-1220-430000. The work of the second author was in the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. The work of fifth author was funded in part by the Swiss National Science Foundation (SNF) project "Mining Conversational Content for Topic Modelling and Author Identification (ChatMiner)" under grant number 200021_130208.Rangel, F.; Rosso, P.; Koppel, M.; Stamatatos, E.; Inches, G. (2013). Overview of the Author Profiling Task at PAN 2013. CLEF Conference on Multilingual and Multimodal Information Access Evaluation. 352-365. http://hdl.handle.net/10251/46636S35236

    Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning

    Get PDF
    Providing a safe environment for juveniles and children in online social networks is considered as a major factor in improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of juvenile abuse in cyberspace has become inevitable. Using automatic ways to address this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and binary classification in machine learning. This thesis proposes two machine learning approaches to deal with the following two issues in the domain of online predator identification: 1) The first problem is gathering a comprehensive set of negative training samples which is unrealistic due to the nature of the problem. This problem is addressed by applying an existing method for semi-supervised anomaly detection that allows the training process based on only one class label. The method was tested on two datasets; 2) The second issue is improving the performance of current binary classification methods in terms of classification accuracy and F1-score. In this regard, we have customized a deep learning approach called Convolutional Neural Network to be used in this domain. Using this approach, we show that the classification performance (F1-score) is improved by almost 1.7% compared to the classification method (Support Vector Machine). Two different datasets were used in the empirical experiments: PAN-2012 and SQ (Sûreté du Québec). The former is a large public dataset that has been used extensively in the literature and the latter is a small dataset collected from the Sûreté du Québec
    • …
    corecore