471 research outputs found
Deep Learning for User Comment Moderation
Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation
Toxic Comment Classification using Deep Learning
Online Conversation media serves as a means for individuals to engage, cooperate, and exchange ideas; however, it is also considered a platform that facilitates the spread of hateful and offensive comments, which could significantly impact one's emotional and mental health. The rapid growth of online communication makes it impractical to manually identify and filter out hateful tweets. Consequently, there is a pressing need for a method or strategy to eliminate toxic and abusive comments and ensure the safety and cleanliness of social media platforms. Utilizing LSTM, Character-level CNN, Word-level CNN, and Hybrid model (LSTM + CNN) in this toxicity analysis is to classify comments and identify the different types of toxic classes by means of a comparative analysis of various models. The neural network models utilized for this analysis take in comments extracted from online platforms, including both toxic and non-toxic comments. The results of this study can contribute towards the development of a web interface that enables the identification of toxic and hateful comments within a given sentence or phrase, and categorizes them into their respective toxicity classes
Detection of Hate-Speech Tweets Based on Deep Learning: A Review
Cybercrime, cyberbullying, and hate speech have all increased in conjunction with the use of the internet and social media. The scope of hate speech knows no bounds or organizational or individual boundaries. This disorder affects many people in diverse ways. It can be harsh, offensive, or discriminating depending on the target's gender, race, political opinions, religious intolerance, nationality, human color, disability, ethnicity, sexual orientation, or status as an immigrant. Authorities and academics are investigating new methods for identifying hate speech on social media platforms like Facebook and Twitter. This study adds to the ongoing discussion about creating safer digital spaces while balancing limiting hate speech and protecting freedom of speech.  Partnerships between researchers, platform developers, and communities are crucial in creating efficient and ethical content moderation systems on Twitter and other social media sites. For this reason, multiple methodologies, models, and algorithms are employed. This study presents a thorough analysis of hate speech in numerous research publications. Each article has been thoroughly examined, including evaluating the algorithms or methodologies used, databases, classification techniques, and the findings achieved.  In addition, comprehensive discussions were held on all the examined papers, explicitly focusing on consuming deep learning techniques to detect hate speech
Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity
The detection of online cyberbullying has seen an increase in societal
importance, popularity in research, and available open data. Nevertheless,
while computational power and affordability of resources continue to increase,
the access restrictions on high-quality data limit the applicability of
state-of-the-art techniques. Consequently, much of the recent research uses
small, heterogeneous datasets, without a thorough evaluation of applicability.
In this paper, we further illustrate these issues, as we (i) evaluate many
publicly available resources for this task and demonstrate difficulties with
data collection. These predominantly yield small datasets that fail to capture
the required complex social dynamics and impede direct comparison of progress.
We (ii) conduct an extensive set of experiments that indicate a general lack of
cross-domain generalization of classifiers trained on these sources, and openly
provide this framework to replicate and extend our evaluation criteria.
Finally, we (iii) present an effective crowdsourcing method: simulating
real-life bullying scenarios in a lab setting generates plausible data that can
be effectively used to enrich real data. This largely circumvents the
restrictions on data that can be collected, and increases classifier
performance. We believe these contributions can aid in improving the empirical
practices of future research in the field
A review on deep-learning-based cyberbullying detection
Bullying is described as an undesirable behavior by others that harms an individual physically, mentally, or socially. Cyberbullying is a virtual form (e.g., textual or image) of bullying or harassment, also known as online bullying. Cyberbullying detection is a pressing need in today’s world, as the prevalence of cyberbullying is continually growing, resulting in mental health issues. Conventional machine learning models were previously used to identify cyberbullying. However, current research demonstrates that deep learning surpasses traditional machine learning algorithms in identifying cyberbullying for several reasons, including handling extensive data, efficiently classifying text and images, extracting features automatically through hidden layers, and many others. This paper reviews the existing surveys and identifies the gaps in those studies. We also present a deep-learning-based defense ecosystem for cyberbullying detection, including data representation techniques and different deep-learning-based models and frameworks. We have critically analyzed the existing DL-based cyberbullying detection techniques and identified their significant contributions and the future research directions they have presented. We have also summarized the datasets being used, including the DL architecture being used and the tasks that are accomplished for each dataset. Finally, several challenges faced by the existing researchers and the open issues to be addressed in the future have been presented
ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese
Social media processing is a fundamental task in natural language processing
with numerous applications. As Vietnamese social media and information science
have grown rapidly, the necessity of information-based mining on Vietnamese
social media has become crucial. However, state-of-the-art research faces
several significant drawbacks, including imbalanced data and noisy data on
social media platforms. Imbalanced and noisy are two essential issues that need
to be addressed in Vietnamese social media texts. Graph Convolutional Networks
can address the problems of imbalanced and noisy data in text classification on
social media by taking advantage of the graph structure of the data. This study
presents a novel approach based on contextualized language model (PhoBERT) and
graph-based method (Graph Convolutional Networks). In particular, the proposed
approach, ViCGCN, jointly trained the power of Contextualized embeddings with
the ability of Graph Convolutional Networks, GCN, to capture more syntactic and
semantic dependencies to address those drawbacks. Extensive experiments on
various Vietnamese benchmark datasets were conducted to verify our approach.
The observation shows that applying GCN to BERTology models as the final layer
significantly improves performance. Moreover, the experiments demonstrate that
ViCGCN outperforms 13 powerful baseline models, including BERTology models,
fusion BERTology and GCN models, other baselines, and SOTA on three benchmark
social media datasets. Our proposed ViCGCN approach demonstrates a significant
improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized
Language Models, including multilingual and monolingual, on three benchmark
datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our
integrated model ViCGCN achieves the best performance compared to other
BERTology integrated with GCN models
- …