72 research outputs found

    Detecting Aggressiveness in Tweets: A Hybrid Model for Detecting Cyberbullying in the Spanish Language

    Get PDF
    In recent years, the use of social networks has increased exponentially, which has led to a significant increase in cyberbullying. Currently, in the field of Computer Science, research has been made on how to detect aggressiveness in texts, which is a prelude to detecting cyberbullying. In this field, the main work has been done for English language texts, mainly using Machine Learning (ML) approaches, Lexicon approaches to a lesser extent, and very few works using hybrid approaches. In these, Lexicons and Machine Learning algorithms are used, such as counting the number of bad words in a sentence using a Lexicon of bad words, which serves as an input feature for classification algorithms. This research aims at contributing towards detecting aggressiveness in Spanish language texts by creating different models that combine the Lexicons and ML approach. Twenty-two models that combine techniques and algorithms from both approaches are proposed, and for their application, certain hyperparameters are adjusted in the training datasets of the corpora, to obtain the best results in the test datasets. Three Spanish language corpora are used in the evaluation: Chilean, Mexican, and Chilean-Mexican corpora. The results indicate that hybrid models obtain the best results in the 3 corpora, over implemented models that do not use Lexicons. This shows that by mixing approaches, aggressiveness detection improves. Finally, a web application is developed that gives applicability to each model by classifying tweets, allowing evaluating the performance of models with external corpus and receiving feedback on the prediction of each one for future research. In addition, an API is available that can be integrated into technological tools for parental control, online plugins for writing analysis in social networks, and educational tools, among others

    Resumen de HOMO-MEX en Iberlef 2023: Detección de discursos de odio en mensajes online dirigidos hacia la población LGBTQ+ hablante de español mexicano

    Get PDF
    The detection of hate speech and stereotypes in online platforms has gained significant attention in the field of Natural Language Processing (NLP). Among various forms of discrimination, LGBTQ+ phobia is prevalent on social media, particularly on platforms like Twitter. The objective of the HOMO-MEX task is to encourage the development of NLP systems that can detect and classify LGBTQ+ phobic content in Spanish tweets, regardless of whether it is expressed aggressively or subtly. The task aims to address the lack of dedicated resources for LGBTQ+ phobia detection in Spanish Twitter and encourages participants to employ multi-label classification approaches.La detección de discursos de odio y estereotipos en plataformas en línea ha suscitado gran atención en el campo del Procesamiento del Lenguaje Natural (PLN). Entre las diversas formas de discriminación, la LGBTQ+fobia es frecuente en las redes sociales, especialmente en plataformas como Twitter. El objetivo de la tarea HOMO-MEX es fomentar el desarrollo de sistemas de PLN que puedan detectar y clasificar contenido LGBTQ+fóbico en tuits en español, independientemente de si se expresa de forma agresiva o sutil. La tarea pretende abordar la falta de recursos dedicados a la detección de la fobia LGBTQ+ en Twitter en español y anima a los participantes a emplear enfoques de clasificación multietiqueta.This paper has been supported by PAPIIT projects IT100822, TA101722, and CONAHCYT CF-2023-G-64. Also, we thank Alejandro Ojeda Trueba for the creation of the HOMO-MEX presentation image. GBE is supported by a grant from the Ministry of Universities of the Government of Spain, financed by the European Union, NextGeneration EU (María Zambrano program)

    A comparison of classification models to detect cyberbullying in the peruvian spanish language on Twitter

    Get PDF
    Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier

    A comparison of classification models to detect cyberbullying in the Peruvian Spanish language on twitter

    Get PDF
    Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier

    nlpBDpatriots at BLP-2023 Task 1: A Two-Step Classification for Violence Inciting Text Detection in Bangla

    Full text link
    In this paper, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD) organized as part of the first workshop on Bangla Language Processing (BLP) co-located with EMNLP. The aim of this task is to identify and classify the violent threats, that provoke further unlawful violent acts. Our best-performing approach for the task is two-step classification using back translation and multilinguality which ranked 6th out of 27 teams with a macro F1 score of 0.74

    Killing me Softly: Creative and Cognitive Aspects of Implicitness in Abusive Language Online

    Full text link
    [EN] Abusive language is becoming a problematic issue for our society. The spread of messages that reinforce social and cultural intolerance could have dangerous effects in victims¿ life. State-of-the-art technologies are often effective on detecting explicit forms of abuse, leaving unidentified the utterances with very weak offensive language but a strong hurtful effect. Scholars have advanced theoretical and qualitative observations on specific indirect forms of abusive language that make it hard to be recognized automatically. In this work, we propose a battery of statistical and computational analyses able to support these considerations, with a focus on creative and cognitive aspects of the implicitness, in texts coming from different sources such as social media and news. We experiment with transformers, multi-task learning technique, and a set of linguistic features to reveal the elements involved in the implicit and explicit manifestations of abuses, providing a solid basis for computational applications.Frenda, S.; Patti, V.; Rosso, P. (2022). Killing me Softly: Creative and Cognitive Aspects of Implicitness in Abusive Language Online. Natural Language Engineering. 1-22. https://doi.org/10.1017/S135132492200031612

    Resumen de la tarea Rest-Mex en IberLEF 2022: Sistema de Recomendación, Análisis de Sentimiento y Predicción de Semáforo Covid para Textos Turísticos Mexicanos

    Get PDF
    This paper presents the framework and results from the Rest-Mex task at IberLEF 2022. This task considered three tracks: Recommendation System, Sentiment Analysis and Covid Semaphore Prediction, using texts from Mexican touristic places. The Recommendation System task consists in predicting the degree of satisfaction that a tourist may have when recommending a destination of Nayarit, Mexico, based on places visited by the tourists and their opinions. On the other hand, the Sentiment Analysis task predicts the polarity of an opinion issued and the attraction by a tourist who traveled to the most representative places in Mexico. We have built corpora for both tasks considering Spanish opinions from the TripAdvisor website. As a novelty, the Covid Semaphore Prediction task aims to predict the color of the Mexican Semaphore for each state, according to the Covid news in the state, using data from the Mexican Ministry of Health. This paper compares and discusses the participants’ results for all three tacks.Este artículo presenta el marco y los resultados de la tarea Rest-Mex en IberLEF 2022. Esta tarea consideró tres sub tareas: Sistema de recomendación, Análisis de sentimiento y Predicción de semáforo Covid, utilizando textos de lugares turísticos mexicanos. La tarea del Sistema de Recomendación consiste en predecir el grado de satisfacción que puede tener un turista al recomendar un destino de Nayarit, México, con base en los lugares visitados por los turistas y sus opiniones. Por otro lado, la tarea de Análisis de Sentimiento predice la polaridad de una opinión emitida y la atracción por parte de un turista que viajó a los lugares más representativos de México. Hemos construido corpus para ambas tareas teniendo en cuenta las opiniones en español de TripAdvisor. Como novedad, la tarea de Predicción de Semáforo Covid tiene como objetivo predecir el color del Semáforo Mexicano para cada estado, de acuerdo a las noticias Covid en el estado, utilizando datos de la Secretaría de Salud de México. Este documento compara y discute los resultados de los participantes para las tres sub tareas

    Multi-view informed attention-based model for Irony and Satire detection in Spanish variants

    Full text link
    [EN] Making machines understand language and reasoning on it has been one of the most challenging problems addressed by Artificial Intelligent researchers. This challenge increases when figurative language is used for communicating complex meanings, intentions, emotions and attitudes in creative and funny ways. In fact, sentiment analysis approaches struggle when facing irony, satire and other figurative languages, particularly those where the explanation of a prediction might arguably be as necessary as the prediction itself. This paper describes a new model MvAttLSTM based on deep learning for irony and satire detection in tweets written in distinct Spanish variants. The proposed model is based on an attentive-LSTM informed with three additional views learned from distinct perspectives. We investigate two strategies to pass these views into MvAttLSTM. We perform an extensive evaluation on three corpora, one for irony detection and two for satire detection. Moreover, in order to study the robustness of our proposed model, we investigate its performance on humor recognition. Experiments confirm that the proposed views help our model to improve its performance. Moreover, they show that affective information benefits our model to detect irony and satire. In particular, a first analysis of the results highlights the discriminating power of emotional features obtained from SenticNet and SEL lexicon. Overall, our system achieves the state-of-the-art performance in irony and satire detection in Spanish variants and competitive results in humor recognition.The work of the first two authors was in the framework of the research project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31) , funded by Spanish Ministry of Science and Innovation, and DeepPattern (PROMETEO/2019/121) , funded by the Generalitat Valenciana, Spain.Ortega-Bueno, R.; Rosso, P.; Medina-Pagola, JE. (2022). Multi-view informed attention-based model for Irony and Satire detection in Spanish variants. Knowledge-Based Systems. 235:1-24. https://doi.org/10.1016/j.knosys.2021.10759712423