Search CORE

5 research outputs found

Abusive Text Detection Using Neural Networks

Author: Chen Hao
Delany Sarah Jane
McKeever Susan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2017
Field of study

eural network models have become increasingly popular for text classification in recent years. In particular, the emergence of word embeddings within deep learning architectures has recently attracted a high level of attention amongst researchers. In this paper, we focus on how neural network models have been applied in text classification. Secondly, we extend our previous work [4, 3] using a neural network strategy for the task of abusive text detection. We compare word embedding features to the traditional feature representations such as n-grams and handcrafted features. In addition, we use an off-the-shelf neural network classifier, FastText[16]. Based on our results, the conclusions are: (1) Extracting selected manual features can increase abusive content detection over using basic ngrams; (2) Although averaging pre-trained word embeddings is a naive method, the distributed feature representation has better performance to ngrams in most of our datasets; (3) While the FastText classifier works efficiently with fast performance, the results are not remarkable as it is a shallow neural network with only one hidden layer; (4) Using pre-trained word embeddings does not guarantee better performance in the FastText classifie

Arrow@TUDublin

Abusive Text Detection Using Neural Networks

Author: Chen Hao
Delany Sarah Jane
McKeever Susan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2017
Field of study

Neurall network models have become increasingly popular for text classification in recent years. In particular, the emergence of word embeddings within deep learning architecture has recently attracted a high level of attention amongst researchers

Arrow@TUDublin

Detection of Offensive YouTube Comments, a Performance Comparison of Deep Learning Approaches

Author: Bansal Priyam
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

Social media data is open, free and available in massive quantities. However, there is a significant limitation in making sense of this data because of its high volume, variety, uncertain veracity, velocity, value and variability. This work provides a comprehensive framework of text processing and analysis performed on YouTube comments having offensive and non-offensive contents. YouTube is a platform where every age group of people logs in and finds the type of content that most appeals to them. Apart from this, a massive increase in the use of offensive language has been apparent. As there are massive volume of new comments, each comment cannot be removed manually or it will be bad for business for youtubers if they make their comment section unavailable as they will not be able to get any feedback of any kind

Arrow@TUDublin

The Use of Deep Learning Distributed Representations in the Identification of Abusive Text

Author: chen hao
Delany Sarah Jane
McKeever Susan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models using 10 labelled datasets gathered from different social media platforms. Our results show that multi-task sentence embedding models perform best with consistently highest classification results in comparison to other embedding models. We hope our work can be a guideline for practitioners in selecting appropriate features in text classification task, particularly in the domain of abuse detection

Arrow@TUDublin

Association for the Advancement of Artificial Intelligence: AAAI Publications