8 research outputs found

    Online Sexual Predator Detection

    Get PDF
    Online sexual abuse is a concerning yet severely overlooked vice of modern society. With more children being on the Internet and with the ever-increasing advent of web-applications such as online chatrooms and multiplayer games, preying on vulnerable users has become more accessible for predators. In recent years, there has been work on detecting online sexual predators using Machine Learning and deep learning techniques. Such work has trained on severely imbalanced datasets, and imbalance is handled via manual trimming of over-represented labels. In this work, we propose an approach that first tackles the problem of imbalance and then improves the effectiveness of the underlying classifiers. Our evaluation of the proposed sampling approach on PAN benchmark dataset shows performance improvements on several classification metrics, compared to prior methods that otherwise require hands-crafted sampling of the data

    Detección automática de ciber acoso en redes sociales.

    Get PDF
    Objetivos y método de estudio: El objetivo general de esta investigación es el de contribuir al desarrollo de un enfoque que permita avanzar en la detección del ciber acoso de manera automática en una red social, utilizando técnicas de aprendizaje computacional, análisis de sentimiento y minería de datos, herramientas que forman parte de las tecnologías de información. De manera particular, para desarrollar este enfoque, se realiza una búsqueda de los comentarios destacados como agresivos. Además, se identifican los involucrados dentro de un caso de ciber acoso, así como la frecuencia con la que se envían los comentarios agresivos, siendo estos los componentes que se consideran para lograr la detección de ciber acoso en una red social. Contribuciones y conclusiones: La contribución principal es una metodología que favorece en la detección de casos de ciberacoso en una red social. Este proceso de búsqueda, comienza con la recopilación de comentarios y la asignación automática de un nivel de agresividad a estos comentarios. Este nivel de agresividad nos ayuda a poder identificar los componentes que se consideran en un caso de ciberacoso, la frecuencia del envío de mensajes de textos considerados agresivos y los involucrados en este envío de mensajes. Al contar con estos datos se puede lograr conseguir detectar casos de ciber acoso en una red social

    Detecting deceptive behaviour in the wild:text mining for online child protection in the presence of noisy and adversarial social media communications

    Get PDF
    A real-life application of text mining research “in the wild”, i.e. in online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied in the wild typically has no control over the dataset size. Hence, the system has to be robust towards limited data availability, a variable number of samples across users and a highly skewed dataset. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise. Finally, it has to be robust towards deceptive behaviour or adversaries. This thesis examines the viability of a text mining approach for supporting cybercrime investigations pertaining to online child protection. The main contributions of this dissertation are as follows. A systematic study of different aspects of methodological design of a state-ofthe- art text mining approach is presented to assess its scalability towards a large, imbalanced and linguistically noisy social media dataset. In this framework, three key automatic text categorisation tasks are examined, namely the feasibility to (i) identify a social network user’s age group and gender based on textual information found in only one single message; (ii) aggregate predictions on the message level to the user level without neglecting potential clues of deception and detect false user profiles on social networks and (iii) identify child sexual abuse media among thousands of legal other media, including adult pornography, based on their filename. Finally, a novel approach is presented that combines age group predictions with advanced text clustering techniques and unsupervised learning to identify online child sex offenders’ grooming behaviour. The methodology presented in this thesis was extensively discussed with law enforcement to assess its forensic readiness. Additionally, each component was evaluated on actual child sex offender data. Despite the challenging characteristics of these text types, the results show high degrees of accuracy for false profile detection, identifying grooming behaviour and child sexual abuse media identification