'Division of Chemical Information and Computer Sciences'
Abstract
This thesis addresses the detection of cyberbullying by employing cutting-edge machine learning approaches to distinguish offensive language and hate speech. The escalating issue of cyberbullying in digital communication necessitates robust detection systems for online safety. Emphasizing on dividing textual content into categories such as offensive language, non-hate speech, and hate speech, using machine learning models like Random Forest, AdaBoost, Decision Trees, Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), and RoBERTa (Robustly optimized BERT approach). Each algorithm offers unique strengths in language processing. Decision Trees provide basic, interpretable classification rules, while ensemble methods like Random Forest and AdaBoost enhance accuracy through combined decision-making. LSTM excels in sequential data analysis, capturing contextual nuances. BERT and RoBERTa offer advanced deep learning capabilities, with RoBERTa building upon BERT's architecture for improved performance through optimized training and larger datasets. Bidirectional text analysis helps BERT and RoBERTa both grasp context. Training the model, feature extraction, and data preprocessing are all included in the research technique. Accuracy, precision, recall, and F1-score metrics are used to assess model performance. The results reveal the effectiveness of these models in differentiating between the categories, with BERT and RoBERTa showing notable proficiency due to their advanced contextual analysis. By highlighting the potential of various machine learning techniques in tackling difficult online communication problems, this work advances the field of cyberbullying research and facilitates the creation of safer digital interaction technologies