1,518 research outputs found

    Do Linguistic Features Help Deep Learning? The Case of Aggressiveness in Mexican Tweets

    Get PDF
    [EN] In the last years, the control of online user generated content is becoming a priority, because of the increase of online aggressiveness and hate speech legal cases. Considering the complexity and the importance of this issue, this paper presents an approach that combines the deep learning framework with linguistic features for the recognition of aggressiveness in Mexican tweets. This approach has been evaluated relying on a collection of tweets released by the organizers of the shared task about aggressiveness detection in the context of the Ibereval 2018 evaluation campaign. The use of a benchmark corpus allows to compare the results with those obtained by Ibereval 2018 participant systems. However, looking at the achieved results, linguistic features seem not to help the deep learning classification for this task.The work of Simona Frenda and Paolo Rosso was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P).Frenda, S.; Banerjee, S.; Rosso, P.; Patti, V. (2020). Do Linguistic Features Help Deep Learning? The Case of Aggressiveness in Mexican Tweets. Computación y Sistemas. 24(2):633-643. https://doi.org/10.13053/CyS-24-2-3398S63364324

    The role of sarcasm in hate speech.A multilingual perspective

    Get PDF

    Detecting Aggressiveness in Tweets: A Hybrid Model for Detecting Cyberbullying in the Spanish Language

    Get PDF
    In recent years, the use of social networks has increased exponentially, which has led to a significant increase in cyberbullying. Currently, in the field of Computer Science, research has been made on how to detect aggressiveness in texts, which is a prelude to detecting cyberbullying. In this field, the main work has been done for English language texts, mainly using Machine Learning (ML) approaches, Lexicon approaches to a lesser extent, and very few works using hybrid approaches. In these, Lexicons and Machine Learning algorithms are used, such as counting the number of bad words in a sentence using a Lexicon of bad words, which serves as an input feature for classification algorithms. This research aims at contributing towards detecting aggressiveness in Spanish language texts by creating different models that combine the Lexicons and ML approach. Twenty-two models that combine techniques and algorithms from both approaches are proposed, and for their application, certain hyperparameters are adjusted in the training datasets of the corpora, to obtain the best results in the test datasets. Three Spanish language corpora are used in the evaluation: Chilean, Mexican, and Chilean-Mexican corpora. The results indicate that hybrid models obtain the best results in the 3 corpora, over implemented models that do not use Lexicons. This shows that by mixing approaches, aggressiveness detection improves. Finally, a web application is developed that gives applicability to each model by classifying tweets, allowing evaluating the performance of models with external corpus and receiving feedback on the prediction of each one for future research. In addition, an API is available that can be integrated into technological tools for parental control, online plugins for writing analysis in social networks, and educational tools, among others

    Resumen de la Tarea DA-VINCIS en IberLEF 2022: Detección de Incidentes Violentos en Redes Sociales en Español

    Get PDF
    This paper presents the overview of the DA-VINCIS 2022 task, organized at IberLEF 2023 and co-located with the 38th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2022). DA-VINCIS challenged participants to develop automated solutions for the detection of violent events mentioned in social networks. We released a novel corpus collected from Twitter and manually labeled with 4 categories of violent incidents (plus the no-incident label). The shared task focused on the Mexican variant of Spanish and it was divided into two tracks: (1) a binary classification task in which users had to determine whether tweets were associated to a violent incident or not; and (2) a multi-label classification task in which the category of the violent incident should be spotted. More than 40 teams registered for the task and 12 participants submitted predictions for the final phase. Very competitive results were reported in both sub tasks, where transformer-based solutions obtained the best results. Corpora and results are available at the shared task website at https://codalab.lisn.upsaclay.fr/competitions/2638.Se presenta el resumen de la tarea DA-VINCIS 2022, organizada en IberLEF 2022 junto a la 38ª Conferencia Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN 2022). DA-VINCIS plantea el reto de detectar automáticamente piezas de información en redes sociales que estén asociadas a eventos violentos. Se liberó un nuevo corpus para el Español Mexicano que fue etiquetado manualmente con 4 categorías de eventos violentos (además de la categoría no-violento). Se propusieron dos subtareas: (1) una tarea de clasificación binaria donde se buscaba distinguir tuits asociados a eventos violentos del resto; y otra (2) donde se buscaba identificar la categoría del evento violento. Más de 40 participantes se registraron en el portal y 12 enviaron resultados para la fase final. Los resultados obtenidos fueron muy competitivos para ambas tareas; las soluciones que obtuvieron los mejores resultados se basaron en modelos tipo transformer para el español. El corpus y los resultados detallados pueden consultarse en el sitio web de la tarea: https://codalab.lisn.upsaclay.fr/competitions/2638.This work was supported by CONACyT under grant CB-S-26314, Integración de Lenguaje y Visión mediante Representaciones Multimodales Aprendidas para Clasificación y Recuperación de Imágenes. We also would like to thank CONACyT for partially supporting this work under grant CB-2015-01-257383

    Resumen de HOMO-MEX en Iberlef 2023: Detección de discursos de odio en mensajes online dirigidos hacia la población LGBTQ+ hablante de español mexicano

    Get PDF
    The detection of hate speech and stereotypes in online platforms has gained significant attention in the field of Natural Language Processing (NLP). Among various forms of discrimination, LGBTQ+ phobia is prevalent on social media, particularly on platforms like Twitter. The objective of the HOMO-MEX task is to encourage the development of NLP systems that can detect and classify LGBTQ+ phobic content in Spanish tweets, regardless of whether it is expressed aggressively or subtly. The task aims to address the lack of dedicated resources for LGBTQ+ phobia detection in Spanish Twitter and encourages participants to employ multi-label classification approaches.La detección de discursos de odio y estereotipos en plataformas en línea ha suscitado gran atención en el campo del Procesamiento del Lenguaje Natural (PLN). Entre las diversas formas de discriminación, la LGBTQ+fobia es frecuente en las redes sociales, especialmente en plataformas como Twitter. El objetivo de la tarea HOMO-MEX es fomentar el desarrollo de sistemas de PLN que puedan detectar y clasificar contenido LGBTQ+fóbico en tuits en español, independientemente de si se expresa de forma agresiva o sutil. La tarea pretende abordar la falta de recursos dedicados a la detección de la fobia LGBTQ+ en Twitter en español y anima a los participantes a emplear enfoques de clasificación multietiqueta.This paper has been supported by PAPIIT projects IT100822, TA101722, and CONAHCYT CF-2023-G-64. Also, we thank Alejandro Ojeda Trueba for the creation of the HOMO-MEX presentation image. GBE is supported by a grant from the Ministry of Universities of the Government of Spain, financed by the European Union, NextGeneration EU (María Zambrano program)
    corecore