5 research outputs found

    A Simple Classifier for Detecting Online Child Grooming Conversation

    Get PDF
    The massive proliferation of social media has opened possibilities for the perpetrator conducting the crime of online child grooming. Because the pervasiveness of the problem scale, it may only be tamed effectively and efficiently by using an automatic grooming conversation detection system. The current study intends to address the issue by using Support Vector Machine and k-nearest neighbors’ classifiers. Besides, the study also proposes a low-computational cost classification method, which classifies a conversation using the number of the existing grooming conversation characteristics. All proposed methods are evaluated using 150 textual conversations of which 105 are grooming, and 45 are non-grooming. We identify that grooming conversations possess 17 features of grooming characteristics. The results suggest that the SVM and k-NN can identify grooming conversations at 98.6% and 97.8% of the level of accuracy. Meanwhile, the proposed simple method has 96.8% accuracy. The empirical study also suggests that two among the seventeen characteristics are insignificant for the classification

    A human-centered systematic literature review of the computational approaches for online sexual risk detection

    Full text link
    In the era of big data and artificial intelligence, online risk detection has become a popular research topic. From detecting online harassment to the sexual predation of youth, the state-of-the-art in computational risk detection has the potential to protect particularly vulnerable populations from online victimization. Yet, this is a high-risk, high-reward endeavor that requires a systematic and human-centered approach to synthesize disparate bodies of research across different application domains, so that we can identify best practices, potential gaps, and set a strategic research agenda for leveraging these approaches in a way that betters society. Therefore, we conducted a comprehensive literature review to analyze 73 peer-reviewed articles on computational approaches utilizing text or meta-data/multimedia for online sexual risk detection. We identified sexual grooming (75%), sex trafficking (12%), and sexual harassment and/or abuse (12%) as the three types of sexual risk detection present in the extant literature. Furthermore, we found that the majority (93%) of this work has focused on identifying sexual predators after-the-fact, rather than taking more nuanced approaches to identify potential victims and problematic patterns that could be used to prevent victimization before it occurs. Many studies rely on public datasets (82%) and third-party annotators (33%) to establish ground truth and train their algorithms. Finally, the majority of this work (78%) mostly focused on algorithmic performance evaluation of their model and rarely (4%) evaluate these systems with real users. Thus, we urge computational risk detection researchers to integrate more human-centered approaches to both developing and evaluating sexual risk detection algorithms to ensure the broader societal impacts of this important work.Accepted manuscrip

    Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning

    Get PDF
    Providing a safe environment for juveniles and children in online social networks is considered as a major factor in improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of juvenile abuse in cyberspace has become inevitable. Using automatic ways to address this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and binary classification in machine learning. This thesis proposes two machine learning approaches to deal with the following two issues in the domain of online predator identification: 1) The first problem is gathering a comprehensive set of negative training samples which is unrealistic due to the nature of the problem. This problem is addressed by applying an existing method for semi-supervised anomaly detection that allows the training process based on only one class label. The method was tested on two datasets; 2) The second issue is improving the performance of current binary classification methods in terms of classification accuracy and F1-score. In this regard, we have customized a deep learning approach called Convolutional Neural Network to be used in this domain. Using this approach, we show that the classification performance (F1-score) is improved by almost 1.7% compared to the classification method (Support Vector Machine). Two different datasets were used in the empirical experiments: PAN-2012 and SQ (Sûreté du Québec). The former is a large public dataset that has been used extensively in the literature and the latter is a small dataset collected from the Sûreté du Québec

    Human-Centered Approach to Technology to Combat Human Trafficking

    Get PDF
    Human trafficking is a serious crime that continues to plague the United States. With the rise of computing technologies, the internet has become one of the main mediums through which this crime is facilitated. Fortunately, these online activities leave traces which are invaluable to law enforcement agencies trying to stop human trafficking. However, identifying and intervening with these cases is still a challenging task. The sheer volume of online activity makes it difficult for law enforcement to efficiently identify any potential leads. To compound this issue, traffickers are constantly changing their techniques online to evade detection. Thus, there is a need for tools to efficiently sift through all this online data and narrow down the number of potential leads that a law enforcement agency can deal with. While some tools and prior research do exist for this purpose, none of these tools adequately address law enforcement user needs for information visualizations and spatiotemporal analysis. Thus to address these gaps, this thesis contributes an empirical study of technology and human trafficking. Through in-depth qualitative interviews, systemic literature analysis, and a user-centered design study, this research outlines the challenges and design considerations for developing sociotechnical tools for anti-trafficking efforts. This work further contributes to the greater understanding of the prosecution efforts within the anti-trafficking domain and concludes with the development of a visual analytics prototype that incorporates these design considerations.Ph.D
    corecore