84 research outputs found

    An experimental study on feature engineering and learning approaches for aggression detection in social media

    Get PDF
    With the widespread of modern technologies and social media networks, a new form of bullying occurring anytime and anywhere has emerged. This new phenomenon, known as cyberaggression or cyberbullying, refers to aggressive and intentional acts aiming at repeatedly causing harm to other person involving rude, insulting, offensive, teasing or demoralising comments through online social media. As these aggressions represent a threatening experience to Internet users, especially kids and teens who are still shaping their identities, social relations and well-being, it is crucial to understand how cyberbullying occurs to prevent it from escalating. Considering the massive information on the Web, the developing of intelligent techniques for automatically detecting harmful content is gaining importance, allowing the monitoring of large-scale social media and the early detection of unwanted and aggressive situations. Even though several approaches have been developed over the last few years based both on traditional and deep learning techniques, several concerns arise over the duplication of research and the difficulty of comparing results. Moreover, there is no agreement regarding neither which type of technique is better suited for the task, nor the type of features in which learning should be based. The goal of this work is to shed some light on the effects of learning paradigms and feature engineering approaches for detecting aggressions in social media texts. In this context, this work provides an evaluation of diverse traditional and deep learning techniques based on diverse sets of features, across multiple social media sites.

    A FRAMEWORK FOR ARABIC SENTIMENT ANALYSIS USING MACHINE LEARNING CLASSIFIERS

    Get PDF
    International audienceIn recent years, the use of Internet and online comments, expressed in natural language text, have increased significantly. However, it is difficult for humans to read all these comments and classify them appropriately. Consequently, an automatic approach is required to classify the unstructured data. In this paper, we propose a framework for Arabic language comprising of three steps: pre-processing, feature extraction and machine learning classification. The main aim of the proposed framework is to exploit the combination of different Arabic linguistic features. We evaluate the framework using two benchmark Arabic tweets datasets (ASTD, ATA), which enable sentiment polarity detection in general Arabic and Jordanian dialects. Comparative simulation results show that machine learning classifiers such as Support Vector Machine (SVM), Naive Bayes, MultiLayer Perceptron (MLP) and Logistic Regression-based produce the best performance by using a combination of n-gram features from Arabic tweets datasets. Finally, we evaluate the performance of our proposed framework using an Ensemble classifier approach, with promising results
    • …
    corecore