    Detecting Online Child Grooming Conversation

    A Simple Classifier for Detecting Online Child Grooming Conversation

    The massive proliferation of social media has opened possibilities for the perpetrator conducting the crime of online child grooming. Because the pervasiveness of the problem scale, it may only be tamed effectively and efficiently by using an automatic grooming conversation detection system. The current study intends to address the issue by using Support Vector Machine and k-nearest neighbors’ classifiers. Besides, the study also proposes a low-computational cost classification method, which classifies a conversation using the number of the existing grooming conversation characteristics. All proposed methods are evaluated using 150 textual conversations of which 105 are grooming, and 45 are non-grooming. We identify that grooming conversations possess 17 features of grooming characteristics. The results suggest that the SVM and k-NN can identify grooming conversations at 98.6% and 97.8% of the level of accuracy. Meanwhile, the proposed simple method has 96.8% accuracy. The empirical study also suggests that two among the seventeen characteristics are insignificant for the classification

    Online Sexual Predator Detection

    Online sexual abuse is a concerning yet severely overlooked vice of modern society. With more children being on the Internet and with the ever-increasing advent of web-applications such as online chatrooms and multiplayer games, preying on vulnerable users has become more accessible for predators. In recent years, there has been work on detecting online sexual predators using Machine Learning and deep learning techniques. Such work has trained on severely imbalanced datasets, and imbalance is handled via manual trimming of over-represented labels. In this work, we propose an approach that first tackles the problem of imbalance and then improves the effectiveness of the underlying classifiers. Our evaluation of the proposed sampling approach on PAN benchmark dataset shows performance improvements on several classification metrics, compared to prior methods that otherwise require hands-crafted sampling of the data

    Adaptive Activation Function Generation Through Fuzzy Inference for Grooming Text Categorisation

    The activation function is introduced to determine the output of neural networks by mapping the resulting values of neurons into a specific range. The activation functions often suffer from ‘gradient vanishing’, ‘non zero-centred function outputs’, ‘exploding gradients’, and ‘dead neurons’, which may lead to deterioration in the classification performance. This paper proposes an activation function generation approach using the Takagi-Sugeno-Kang inference in an effort to address such challenges. In addition, the proposed method further optimises the coefficients in the activation function using the genetic algorithm such that the activation function can adapt to different applications. This approach has been applied to a digital forensics application of online grooming detection. The evaluations confirm the superiority of the proposed activation function for online grooming detection using an imbalanced data set

    Classification of online grooming on chat logs using two term weighting schemes

    Due to the growth of Internet, it has not only become the medium for getting information, it has also become a platform for communicating. Social Network Service (SNS) is one of the main platform where Internet users can communicate by distributing, sharing of information and knowledge. Chatting has become a popular communication medium for Internet users whereby users can communicate directly and privately with each other. However, due to the privacy of chat rooms or chatting mediums, the content of chat logs is not monitored and not filtered. Thus, easing cyber predators preying on their preys. Cyber groomers are one of cyber predators who prey on children or minors to satisfy their sexual desire. Workforce expertise that involve in intelligence gathering always deals with difficulty as the complexity of crime increases, human errors and time constraints. Hence, it is difficult to prevent undesired content, such as grooming conversation, in chat logs. An investigation on two term weighting schemes on two datasets are used to improve the content-based classification techniques. This study aims to improve the content-based classification accuracy on chat logs by comparing two term weighting schemes in classifying grooming contents. Two term weighting schemes namely Term Frequency – Inverse Document Frequency – Inverse Class Space Density Frequency (TF.IDF.ICSdF) and Fuzzy Rough Feature Selection (FRFS) are used as feature selection process in filtering chat logs. The performance of these techniques were examined via datasets, and the accuracy of their result was measured by Support Vector Machine (SVM). TF.IDF.ICSdF and FRFS are judged based on accuracy, precision, recall and F score measurement

    Robust adaptive genetic K-Means algorithm using greedy selection for clustering

    Evaluasi Kombinasi Hipernin dan Sinonim untuk Klasifikasi Kebutuhan Non-Functional Berbasis ISO/IEC 25010

    Kebutuhan non-fungsional dianggap mampu mendukung keberhasilan pengembangan perangkat lunak. Namun, kebutuhan non-fungsional sering diabaikan selama proses pengembangan perangkat lunak. Hal ini dikarenakan kebutuhan non-fungsional sering tercampur dengan kebutuhan fungsional. Disamping itu, standar kualitas yang beragam menyebabkan kebingungan dalam menentukan aspek kualitas. Pendekataan yang ada menggunakan ISO/IEC 9126 sebagai referensi untuk mengukur aspek kualitas. ISO/IEC 9126 merupakan standar lama yang dirilis pada tahun 2001. Peneliti sebelumnya mengungkapkan ambiguitas dalam enam sub-atribut pada struktur hirarkis ISO/IEC 9126. Hal ini menimbulkan keraguan serius tentang validitas standar secara keseluruhan. Oleh karena itu, standar kualitas yang digunakan sebagai referensi untuk mengukur aspek kualitas pada penelitian ini adalah ISO/IEC 25010. Selain itu, penelitian ini juga mengusulkan suatu sistem untuk mengidentifikasi aspek kualitas kebutuhan non-fungsional dengan menggunakan 1 level hipernim dan 20 sinonim yang disebut skenario 1. Skenario ini akan dibandingkan dengan 2 level hipernim dan 9 sinonim pada masing-masing sinonim yang disebut skenario 2. Kedua skenario tersebut akan menghasilkan dua data latih berbeda. Kedua data latih tersebut akan dibandingkan menggunakan dua model pengujian yaitu berdasarkan ground truth pakar dan sistem dengan menggunakan metode klasifikasi KNN dan SVM. Hasil pengujian menunjukkan skenario 1 terbukti memberikan nilai lebih baik dibandingkan skenario 2 pada kedua model pengujian, dimana nilai precision dari ground truth pakar, KNN, dan SVM masing-masing 49.3%, 81.0%, dan 74.6%.Abstract Non-Functional requirements are considered capable of supporting the success of software development. However, non-functional requirements are often ignored during the software development process. This is because the quality aspects of non-functional requirements are often mixed with functional requirements. in addition, the number of diverse quality standards causes confusion in determining quality aspects. The existing approach uses ISO / IEC 9126 as a reference to measure quality aspects. ISO / IEC 9126 is an old standard released in 2001. Previous researchers revealed ambiguity in six sub-attributes on the hierarchical structure of ISO / IEC 9126. This raises serious doubts about the validity of the overall standard. Therefore, the quality standard used as a reference to measure the quality aspects of this study is ISO / IEC 25010. In addition, this study also proposes a system to identify aspects of the quality of non-functional requirements using 1 hypernym level and 20 synonyms called scenario 1. This scenario will be compared with 2 hypernym levels and 9 synonyms in each synonym called scenario 2. Both scenarios will produce two different training data. The two training data will be compared using two testing models ie based on expert ground truth and systems using the KNN and SVM classification methods. The test results showed scenario 1 is proven to provide a better value than scenario 2 in both testing models, where the precision values of expert ground truth, KNN, and SVM  respectively 49.3%, 81.0%, and 74.6%