541 research outputs found
Classifiers and text mining: application to a specific context
[Abstract]: The constant growth of social networks has not only brought us new ways of interacting
with each other, but has also given way to a severe increase in negative behaviors: hate
speech, racism, gender harassment, cyberbullying, etc. Manually trying to detect this kind of
behaviours in millions of daily social media posts is out of the question. The solution lies in
developing intelligent systems to automate such detection tasks.
As the nature of these texts is completely subjective, this problem falls under the field
of sentiment analysis, which aims to systematically identify and study affective states and
subjective information in textual data using natural language processing techniques.
In particular, this project is focused on the research of different machine learning techniques
related to natural language processing, in order to automate and perform a reliable
detection and classification of sexist-related behaviours in social media texts. We will tackle
the task of adequately processing the extracted data from social media, as well as researching
various text classification techniques and models that we will use to develop and evaluate a
variety of classifiers.Traballo fin de grao (UDC.FIC). EnxeñarĂa Informática. Curso 2021/202
Classifying Misogynistic Tweets Using a Blended Model: the AMI Shared Task in IBEREVAL 2018
This article describes a possible solution for Automatic Misogyny Identification (AMI) Shared Task at IBEREVAL-2018. The proposed technique is based on combining several simpler classifiers into one more complex blended model, which classified the data taking into account the probabilities of belonging to classes calculated by simpler models. We used the Logistic Regression, Naive Bayes, and SVM classifiers. The experimental results show that blended model works better than simpler models for all three type of classification, for both binomial classification (Misogyny Identifivation, Target Classification) and multinomial classification (Misogynistic Behavior)
Deep Learning Based Misogynistic Bangla Text Identification from Social Media
Misogyny is characterized by hostility, hatred, aversion, intimidation, and violence against women. With the rise of social media, it has become one of the most convenient platforms for expressing woman-hating speech. As a result, misogyny is gaining appeal and societal standards are being violated. With millions of Bangladeshi Facebook users, misogyny is growing increasingly prevalent in Bangla as well. In this paper, we have proposed automatically identifying misogynistic content in Bangla on social media platforms in order to evaluate the problem's challenges. As there is no existing Bangla dataset for analyzing misogynistic text, we generated our own. We have applied various deep-learning algorithms to improve the classification of misogynistic text categories. LSTM and RNN models are used for designing the model architecture in deep learning. Models are evaluated using the confusion matrix, accuracy, and f1-scores. The results indicate that LSTM outperforms RNN in terms of accuracy by 67 %
Multimodal and Explainable Internet Meme Classification
Warning: this paper contains content that may be offensive or upsetting. In
the current context where online platforms have been effectively weaponized in
a variety of geo-political events and social issues, Internet memes make fair
content moderation at scale even more difficult. Existing work on meme
classification and tracking has focused on black-box methods that do not
explicitly consider the semantics of the memes or the context of their
creation. In this paper, we pursue a modular and explainable architecture for
Internet meme understanding. We design and implement multimodal classification
methods that perform example- and prototype-based reasoning over training
cases, while leveraging both textual and visual SOTA models to represent the
individual cases. We study the relevance of our modular and explainable models
in detecting harmful memes on two existing tasks: Hate Speech Detection and
Misogyny Classification. We compare the performance between example- and
prototype-based methods, and between text, vision, and multimodal models,
across different categories of harmfulness (e.g., stereotype and
objectification). We devise a user-friendly interface that facilitates the
comparative analysis of examples retrieved by all of our models for any given
meme, informing the community about the strengths and limitations of these
explainable methods
Misogyny Detection in Social Media on the Twitter Platform
The thesis is devoted to the problem of misogyny detection in social media. In the work we analyse the difference between all offensive language and misogyny language in social media, and review the best existing approaches to detect offensive and misogynistic language, which are based on classical machine learning and neural networks. We also review recent shared tasks aimed to detect misogyny in social media, several of which we have participated in. We propose an approach to the detection and classification of misogyny in texts, based on the construction of an ensemble of models of classical machine learning: Logistic Regression, Naive Bayes, Support Vectors Machines. Also, at the preprocessing stage we used some linguistic features, and novel approaches which allow us to improve the quality of classification. We tested the model on the real datasets both English and multilingual corpora. The results we achieved with our model are highly competitive in this area and demonstrate the capability for future improvement
Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models
[EN]
The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).Magnossao De Paula, AF.; Fray Da Silva, R.; Baris Schlicht, I. (2021). Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. CEUR Workshop. 356-373. http://hdl.handle.net/10251/19056135637
Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models
[EN]
The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).Magnossao De Paula, AF.; Fray Da Silva, R.; Baris Schlicht, I. (2021). Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. CEUR Workshop. 356-373. http://hdl.handle.net/10251/19056135637
Automatic Identification of Misogyny in English and Italian Tweets at EVALITA 2018 with a Multilingual Hate Lexicon
In this paper we describe our submission to the shared task of Automatic Misogyny Identification in English and Italian Tweets (AMI) organized at EVALITA 2018. Our approach is based on SVM classifiers and enhanced by stylistic and lexical features. Additionally, we analyze the use of the novel HurtLex multilingual linguistic resource, developed by enriching in a computational and multilingual perspective of the hate words Italian lexicon by the linguist Tullio De Mauro, in order to investigate its impact in this task.Nel presente lavoro descriviamo il sistema inviato allo shared task di Automatic Misogyny Identification (AMI) ad EVALITA 2018. Il nostro approccio si basa su classificatori SVM, ottimizzati da feature stilistiche e lessicali. Inoltre, analizziamo il ruolo della nuova risorsa linguistica HurtLex, un’estensione in prospettiva computazionale e multilingue del lessico di parole per ferire in italiano proposto dal linguista Tullio De Mauro, per meglio comprendere il suo impatto in questo tipo di task
- …