2 research outputs found

    CAMsterdam at SemEval-2019 task 6: Neural and graph-based feature extraction for the identification of offensive tweets

    Get PDF
    We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offen-sive language identification in Twitter data.Our proposed model learns to extract tex-tual features using a multi-layer recurrent net-work, and then performs text classification us-ing gradient-boosted decision trees (GBDT). A self-attention architecture enables the model to focus on the most relevant areas in the text.We additionally learn globally optimised em-beddings for hashtags using node2vec, which are given as additional tweet features to the GBDT classifier.Our best model obtains78.79% macro F1-score on detecting offensive language (subtask A), 66.32% on categorising offence types (targeted/untargeted; subtask B),and 55.36% on identifying the target of of-fence (subtask C)

    Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques

    Full text link
    The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user‐generated content has made it challenging to iden-tify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several ad-vantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word‐embedding‐techniques‐based natural language processing on algorithmic per-formance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency‐Inverse Document Frequency (TF‐IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi‐GRU and Bi‐LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state‐of‐the‐art approaches for cyberbullying detection, with accuracy and F1‐scores as high as ~95% and ~98%, respectively
    corecore