3 research outputs found
Humor and offense speech classification and scoring using natural language processing
Identifying humor and offense may prove to be an arduous task even for humans. It is,
however, even more challenging to translate it into a logical process that a machine can
understand.
This work pretends to develop machine learning models which will be implemented to
achieve this task. On this track, this study will be based on the SemEval 2021 workshop,
where the participants were challenged to identify and score both humor and offense texts,
as well as detect controversial sentences (SemEval 2021 - Task 7 - Detecting and Rating
Humor and Offense), encouraging the use of current state-of-the-art algorithmic techniques
in Natural Language Processing.
The objective is to identify and propose the most optimal setup to achieve the highest
performance on Humor Detection and related tasks using a common dataset aggregating
eight thousand sentences classified with their respective binary humor indicator and humor
rating, along with binary controversial indicators and offense rating values.
This document presents a solution for the presented tasks based on BERT (Bidirectional
Encoder Representations from Transformers) which makes use of Transformers interpreting
the sentences in both directions (bidirectional), which brings a much higher context
perception into the model. It will compare the performance of three different BERT variants
(BERTBASE, DistillBERT, and RoBERTa), each of them designed for better fit on different
tasks used by industry and academia. Concluding that DistillBERT presented the
most accurate results in the Humor Detection and Humor Rating tasks, while RoBERTa
performed best in the controversial detection task. Finally, BERTBASE outperformed in the
Offensiveness Ranking task.A identificação do humor e ofensa pode revelar-se uma tarefa árdua mesmo para os humanos.
No entanto, é ainda mais desafiante traduzi-lo num processo lógico que uma
máquina possa compreender.
Este trabalho pretende desenvolver modelos de aprendizagem automática que serão
implementados para cumprir esta tarefa. Este estudo será baseado no workshop SemEval
2021, onde os participantes foram desafiados a detectar e classificar sentenças em relação
ao humor e ofensividade, bem como detectar frases controversas (SemEval 2021 - Tarefa
7 - Detecção e Classificação de Humor e Ofensa), encorajando a utilização de estratégias
algorÃtmicas de última geração focadas no processamento computacional da lÃngua.
O objectivo é identificar e propor a melhor configuração para alcançar o melhor desempenho
na Detecção de Humor e tarefas relacionadas, utilizando um conjunto de dados comum
que agrega oito mil sentenças classificadas com os respectivos identificadores binário
de humor e classificação, juntamente com os identificadores binários de controversas e
classificação de ofensas.
Este documento apresenta uma solução para as tarefas apresentadas baseada no BERT
(Bidirectional Encoder Representations from Transformers) que faz uso de Transformers,
uma arquitetura de rede neuronais que permite interpretar as sentenças em ambos os sentidos
(bidireccional), o que traz uma melhor percepção de contexto quando comparada
com outras arquiteturas. Este estudo compara o desempenho de três variantes de BERT
(BERTBASE, DistillBERT, and RoBERTa), cada uma delas concebida para se adaptar melhor
às diferentes tarefas utilizadas pela indústria e pelo meio académico. Concluiu-se que
DistillBERT apresentou o melhor desempenho nas tarefas de Detecção de Humor e Classificação
de Humor, enquanto RoBERTa foi mais preciso na tarefa de detecção de frases
controversas. Finalmente, BERTBASE obteve a melhor performance na tarefa de Classificação
de Ofensividade
Multi-view informed attention-based model for Irony and Satire detection in Spanish variants
[EN] Making machines understand language and reasoning on it has been one of the most challenging problems addressed by Artificial Intelligent researchers. This challenge increases when figurative language is used for communicating complex meanings, intentions, emotions and attitudes in creative and funny ways. In fact, sentiment analysis approaches struggle when facing irony, satire and other figurative languages, particularly those where the explanation of a prediction might arguably be as necessary as the prediction itself. This paper describes a new model MvAttLSTM based on deep learning for irony and satire detection in tweets written in distinct Spanish variants. The proposed model is based on an attentive-LSTM informed with three additional views learned from distinct perspectives. We investigate two strategies to pass these views into MvAttLSTM. We perform an extensive evaluation on three corpora, one for irony detection and two for satire detection. Moreover, in order to study the robustness of our proposed model, we investigate its performance on humor recognition. Experiments confirm that the proposed views help our model to improve its performance. Moreover, they show that affective information benefits our model to detect irony and satire. In particular, a first analysis of the results highlights the discriminating power of emotional features obtained from SenticNet and SEL lexicon. Overall, our system achieves the state-of-the-art performance in irony and satire detection in Spanish variants and competitive results in humor recognition.The work of the first two authors was in the framework of the research project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31) , funded by Spanish Ministry of Science and Innovation, and DeepPattern (PROMETEO/2019/121) , funded by the Generalitat Valenciana, Spain.Ortega-Bueno, R.; Rosso, P.; Medina-Pagola, JE. (2022). Multi-view informed attention-based model for Irony and Satire detection in Spanish variants. Knowledge-Based Systems. 235:1-24. https://doi.org/10.1016/j.knosys.2021.10759712423