Search CORE

18,136 research outputs found

Towards the design of a platform for abuse detection in OSNs using multimedial data analysis

Author: De Turck Filip
Leroux Philip
Vanhove Thomas
Wauters Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Online social networks (OSNs) are becoming increasingly popular every day. The vast amount of data created by users and their actions yields interesting opportunities, both socially and economically. Unfortunately, these online communities are prone to abuse and inappropriate behaviour such as cyber bullying. For victims, this kind of behaviour can lead to depression and other severe problems. However, due to the huge amount of users and data it is impossible to manually check all content posted on the social network. We propose a pluggable architecture with reusable components, able to quickly detect harmful content. The platform uses text-, image-, audio- and video-based analysis modules to detect inappropriate content or high risk behaviour. Domain services aggregate this data and flag user profiles if necessary. Social network moderators need only check the validity of the flagged profiles. This paper reports upon key requirements of the platform, the architectural components and important challenges

Ghent University Academic Bibliography

Characterizing Pedophile Conversations on the Internet using Online Grooming

Author: Gupta Aditi
Kumaraguru Ponnurangam
Sureka Ashish
Publication venue
Publication date: 01/01/2012
Field of study

Cyber-crime targeting children such as online pedophile activity are a major and a growing concern to society. A deep understanding of predatory chat conversations on the Internet has implications in designing effective solutions to automatically identify malicious conversations from regular conversations. We believe that a deeper understanding of the pedophile conversation can result in more sophisticated and robust surveillance systems than majority of the current systems relying only on shallow processing such as simple word-counting or key-word spotting. In this paper, we study pedophile conversations from the perspective of online grooming theory and perform a series of linguistic-based empirical analysis on several pedophile chat conversations to gain useful insights and patterns. We manually annotated 75 pedophile chat conversations with six stages of online grooming and test several hypothesis on it. The results of our experiments reveal that relationship forming is the most dominant online grooming stage in contrast to the sexual stage. We use a widely used word-counting program (LIWC) to create psycho-linguistic profiles for each of the six online grooming stages to discover interesting textual patterns useful to improve our understanding of the online pedophile phenomenon. Furthermore, we present empirical results that throw light on various aspects of a pedophile conversation such as probability of state transitions from one stage to another, distribution of a pedophile chat conversation across various online grooming stages and correlations between pre-defined word categories and online grooming stages

arXiv.org e-Print Archive

CiteSeerX

Graph-based Features for Automatic Online Abuse Detection

Author: F Harary
F Pedregosa
G Csardi
J Kleinberg
K Balci
K Dinakar
LC Freeman
MEJ Newman
PF Bonacich
S Brin
SB Seidman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2017
Field of study

While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually. Automating this task is an important step in reducing the financial cost associated with moderation, but the majority of automated approaches strictly based on message content are highly vulnerable to intentional obfuscation. In this paper, we discuss methods for extracting conversational networks based on raw multi-participant chat logs, and we study the contribution of graph features to a classification system that aims to determine if a given message is abusive. The conversational graph-based system yields unexpectedly high performance , with results comparable to those previously obtained with a content-based approach

arXiv.org e-Print Archive

Crossref

Impact Of Content Features For Automatic Online Abuse Detection

Author: Dufour Richard
Labatut Vincent
Linares Georges
Papegnies Etienne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/04/2017
Field of study

Online communities have gained considerable importance in recent years due to the increasing number of people connected to the Internet. Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great financial interest for community maintainers. Often, the industry uses basic approaches such as bad words filtering and regular expression matching to assist the moderators. In this article, we consider the task of automatically determining if a message is abusive. This task is complex since messages are written in a non-standardized way, including spelling errors, abbreviations, community-specific codes... First, we evaluate the system that we propose using standard features of online messages. Then, we evaluate the impact of the addition of pre-processing strategies, as well as original specific features developed for the community of an online in-browser strategy game. We finally propose to analyze the usefulness of this wide range of features using feature selection. This work can lead to two possible applications: 1) automatically flag potentially abusive messages to draw the moderator's attention on a narrow subset of messages ; and 2) fully automate the moderation process by deciding whether a message is abusive without any human intervention

arXiv.org e-Print Archive

Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features

Author: Cecillon Noé
Dufour Richard
Labatut Vincent
Linarès Georges
Publication venue: 'Frontiers Media SA'
Publication date: 20/05/2019
Field of study

In recent years, online social networks have allowed worldwide users to meet and discuss. As guarantors of these communities, the administrators of these platforms must prevent users from adopting inappropriate behaviors. This verification task, mainly done by humans, is more and more difficult due to the ever growing amount of messages to check. Methods have been proposed to automatize this moderation process, mainly by providing approaches based on the textual content of the exchanged messages. Recent work has also shown that characteristics derived from the structure of conversations, in the form of conversational graphs, can help detecting these abusive messages. In this paper, we propose to take advantage of both sources of information by proposing fusion methods integrating content-and graph-based features. Our experiments on raw chat logs show that the content of the messages, but also of their dynamics within a conversation contain partially complementary information, allowing performance improvements on an abusive message classification task with a final F-measure of 93.26%

arXiv.org e-Print Archive

Hal-Diderot

Social media mining for identification and exploration of health-related information from pregnant women

Author: Chandrashekar Pramod Bharadwaj
Magge Arjun
Sarker Abeed
Gonzalez Graciela
Publication venue
Publication date: 01/01/1989
Field of study

Widespread use of social media has led to the generation of substantial amounts of information about individuals, including health-related information. Social media provides the opportunity to study health-related information about selected population groups who may be of interest for a particular study. In this paper, we explore the possibility of utilizing social media to perform targeted data collection and analysis from a particular population group -- pregnant women. We hypothesize that we can use social media to identify cohorts of pregnant women and follow them over time to analyze crucial health-related information. To identify potentially pregnant women, we employ simple rule-based searches that attempt to detect pregnancy announcements with moderate precision. To further filter out false positives and noise, we employ a supervised classifier using a small number of hand-annotated data. We then collect their posts over time to create longitudinal health timelines and attempt to divide the timelines into different pregnancy trimesters. Finally, we assess the usefulness of the timelines by performing a preliminary analysis to estimate drug intake patterns of our cohort at different trimesters. Our rule-based cohort identification technique collected 53,820 users over thirty months from Twitter. Our pregnancy announcement classification technique achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user timelines. Analysis of the timelines revealed that pertinent health-related information, such as drug-intake and adverse reactions can be mined from the data. Our approach to using user timelines in this fashion has produced very encouraging results and can be employed for other important tasks where cohorts, for which health-related information may not be available from other sources, are required to be followed over time to derive population-based estimates.Comment: 9 page

arXiv.org e-Print Archive

Wageningen University & Research Publications