18 research outputs found

    A study of Hate Speech in Social Media during the COVID-19 outbreak

    Get PDF
    In pandemic situations, hate speech propagates in social media, new forms of stigmatization arise and new groups are targeted with this kind of speech. In this short article, we present work in progress on the study of hate speech in Spanish tweets related to newspaper articles about the COVID-19 pandemic. We cover two main aspects: The construction of a new corpus annotated for hate speech in Spanish tweets, and the analysis of the collected data in order to answer questions from the social field, aided by modern computational tools. Definitions and progress are presented in both aspects. For the corpus, we introduce the data collection process, the annotation schema and criteria, and the data statement. For the analysis, we present our goals and its associated questions. We also describe the definition and training of a hate speech classifier, and present preliminary results using it.Fil: Cotik, Viviana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina.Fil: Debandi, Natalia. Universidad Nacional de Río Negro; Argentina.Fil: Luque, Franco. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Luque, Franco. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Miguel, Paula. Universidad de Buenos Aires; Argentina.Fil: Moro, Agustín. Universidad de Buenos Aires; Argentina.Fil: Moro, Agustín. Universidad Nacional del Centro; Argentina.Fil: Pérez, Juan Manuel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina.Fil: Serrati, Pablo. Universidad de Buenos Aires; Argentina.Fil: Zajac, Joaquín. Universidad de Buenos Aires; Argentina.Fil: Zayat, Demián. Universidad de Buenos Aires; Argentina

    UO_4to@TAG-it 2020: Ensemble of Machine Learning Methods

    Get PDF
    This paper describes the proposal presented in the TAG-it author profiling task from EVALITA 2020 for sub-task 1. The main objective is to predict gender and age of some blog users by their posts, as well as topic they wrote about. Our proposal uses an ensemble of machine learning algorithms with three of the most used classifiers and language model of the n-grams of characters represented in a Bag of Word. To face this task we presented two different strategies aimed at finding the best possible results

    Resumen de la tarea Rest-Mex en IberLEF 2022: Sistema de Recomendación, Análisis de Sentimiento y Predicción de Semáforo Covid para Textos Turísticos Mexicanos

    Get PDF
    This paper presents the framework and results from the Rest-Mex task at IberLEF 2022. This task considered three tracks: Recommendation System, Sentiment Analysis and Covid Semaphore Prediction, using texts from Mexican touristic places. The Recommendation System task consists in predicting the degree of satisfaction that a tourist may have when recommending a destination of Nayarit, Mexico, based on places visited by the tourists and their opinions. On the other hand, the Sentiment Analysis task predicts the polarity of an opinion issued and the attraction by a tourist who traveled to the most representative places in Mexico. We have built corpora for both tasks considering Spanish opinions from the TripAdvisor website. As a novelty, the Covid Semaphore Prediction task aims to predict the color of the Mexican Semaphore for each state, according to the Covid news in the state, using data from the Mexican Ministry of Health. This paper compares and discusses the participants’ results for all three tacks.Este artículo presenta el marco y los resultados de la tarea Rest-Mex en IberLEF 2022. Esta tarea consideró tres sub tareas: Sistema de recomendación, Análisis de sentimiento y Predicción de semáforo Covid, utilizando textos de lugares turísticos mexicanos. La tarea del Sistema de Recomendación consiste en predecir el grado de satisfacción que puede tener un turista al recomendar un destino de Nayarit, México, con base en los lugares visitados por los turistas y sus opiniones. Por otro lado, la tarea de Análisis de Sentimiento predice la polaridad de una opinión emitida y la atracción por parte de un turista que viajó a los lugares más representativos de México. Hemos construido corpus para ambas tareas teniendo en cuenta las opiniones en español de TripAdvisor. Como novedad, la tarea de Predicción de Semáforo Covid tiene como objetivo predecir el color del Semáforo Mexicano para cada estado, de acuerdo a las noticias Covid en el estado, utilizando datos de la Secretaría de Salud de México. Este documento compara y discute los resultados de los participantes para las tres sub tareas

    Resumen de la tarea Rest-Mex en IberLEF 2023: Investigaci´on sobre An´alisis de Sentimiento para Textos Tur´ısticos Mexicanos

    Get PDF
    This paper presents the framework and results of the Rest-Mex task at IberLEF 2023, focusing on sentiment analysis and text clustering of tourist texts. The study primarily focuses on texts related to tourist destinations in Mexico, although this edition included data from Cuba and Colombia for the first time. The sentiment analysis task aims to predict the polarity of opinions expressed by tourists, classifying the type of place visited, whether it’s a tourist attraction, hotel, or restaurant, as well as the country it is located in. On the other hand, the text clustering task aims to classify news articles related to tourism in Mexico. For both tasks, corpora were built using Spanish opinions extracted from TripAdvisor and news articles from Mexican media. This article compares and discusses the results obtained by the participants in both sub-tasks. Additionally, a method is proposed to measure the easiness of a multi-class text classification corpus, along with an approach for system selection in a possible late fusion scheme.Este artículo presenta el marco y los resultados de la tarea Rest-Mex en IberLEF 2023, que se enfoca en el análisis de sentimiento y agrupamiento de textos turísticos. El estudio se centra principalmente en textos relacionados con destinos turísticos en México, aunque esta edición incluyó datos de Cuba y Colombia por primera vez. La tarea de análisis de sentimiento tiene como objetivo predecir la polaridad de opiniones expresadas por turistas, clasificando el tipo de lugar visitado, ya sea un atractivo turístico, un hotel o un restaurante, así como el país en el que se encuentra. Por otro lado, la tarea de agrupamiento de textos busca clasificar noticias relacionadas con el turismo en México. Para ambas tareas, se construyeron corpus utilizando opiniones en español extraídas de TripAdvisor y noticias de medios mexicanos. En este artículo, se comparan y discuten los resultados obtenidos por los participantes en ambas sub tareas. Además, se propone un método para medir la facilidad de un corpus de clasificación textual multi-clase, así como un enfoque para la selección de sistemas en un posible esquema de fusión tardía.The authors thank the Mexican Academy of Tourism Research (AMIT) for their support of the project ”Creation of a labeled database related to tourist destinations for training artificial intelligence models for classifying relevant topics” through the call ”I Research Projects 2022”, which originated this work

    Assessing the impact of contextual information in hate speech detection

    Get PDF
    In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the great amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms. One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not. In this work, we provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic. Classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance for two proposed tasks (binary and multi-label prediction). We make our code, models, and corpus available for further research

    Resources and benchmark corpora for hate speech detection: a systematic review

    Get PDF
    Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement

    nlpBDpatriots at BLP-2023 Task 1: A Two-Step Classification for Violence Inciting Text Detection in Bangla

    Full text link
    In this paper, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD) organized as part of the first workshop on Bangla Language Processing (BLP) co-located with EMNLP. The aim of this task is to identify and classify the violent threats, that provoke further unlawful violent acts. Our best-performing approach for the task is two-step classification using back translation and multilinguality which ranked 6th out of 27 teams with a macro F1 score of 0.74

    Análisis y detección de odio en mensajes de Twitter

    Full text link
    [ES] En la actualidad, la Web constituye un medio donde usuarios de todo el mundo interactúan entre sí, realizando actividades como el comercio digital, la búsqueda de información y la toma de decisiones. De esta forma sitios como las redes sociales han capturado el interés de usuarios y también de analistas. Si bien este fenómeno puede representar una ventaja para el desarrollo de las comunicaciones y la adquisición de información, en este contexto también se han detectado algunas manifestaciones negativas que pueden afectar a diferentes grupos de personas. Los mensajes de odio son un ejemplo de dichos comportamientos negativos, que se publican con frecuencia en redes sociales de gran difusión como Twitter. Estos mensajes expresan odio hacia determinados grupos de personas en función de algún aspecto específico de su identidad, tal como su origen étnico, nacionalidad o religión. Se caracterizan generalmente por ser mensajes virales y por el anonimato de sus autores. Además, diferentes especialistas han identificado que incitan al odio contra el grupo de personas que constituye el objeto de odio de los mensajes, y que incluso, en muchas ocasiones pueden provocar acciones violentas contra dichas personas. Debido a la repercusión que este tipo de publicaciones puede causar en muchas personas, diferentes esfuerzos se han comenzado a realizar. En este sentido, en los últimos años se han organizado varias tareas de evaluación relacionadas con la detección de mensajes de odio. En este trabajo se realiza un análisis de un conjunto de estas tareas, enfocadas en mensajes publicados en Twitter. Se analizan en general las propuestas realizadas por diferentes equipos y en particular nuestras propuestas. Con el estudio de diferentes factores involucrados en las tareas se realiza un conjunto de experimentos. Con lo que se hace una comparación de las estrategias utilizadas y de otras ideas que proponemos. Como resultado se proporciona un resumen de aspectos importantes que pueden servir como guía en el diseño de una aproximación para la detección de mensajes de odio, o como punto de partida para próximos estudios.[EN] Nowdays, the Web constitutes a way where users around the world interact with each other, carrying out important activities such as digital commerce, search of information and decision making. Thus, sites like social networks have captured the interest of both users and analysts. This phenomenon may represent an advantage for the development of communications and the acquisition of information. However, some negative behaviour, that may affect different groups of people, have also been detected in this context. Hate speech is an example of such negative behaviour, which is frequently published on popular social networks such as Twitter. It expresses hatred towards certain groups of people based on some specific aspect of their identity, such as their ethnicity, nationality or religion. It is generally characterized by being viral messages and by the anonymity of their authors. Specialists have identified that it incites hatred against people who are the object of hate in the messages, and that it can bring on violent actions against them in many occasions. Due to the impact this can cause on many people, different efforts have begun to develop. In this sense, several evaluation tasks related to the detection of hate speech have been organized in recent years. In this work we carry out an analysis of a set of these tasks focused on messages published on Twitter. We analyze the proposed approaches made by different teams in general, and our proposals in particular. A set of experiments is performed with the study of the different factors involved in the tasks. In this way a comparison is made of the strategies used and other ideas that we propose. As a result, we provide a summary of some important aspects. It can be useful as a guide for future studies or in the design of an approach to the detection of hate speech.De La Peña Sarracén, GL. (2019). Análisis y detección de odio en mensajes de Twitter. http://hdl.handle.net/10251/129782TFG

    Detecting Aggressiveness in Tweets: A Hybrid Model for Detecting Cyberbullying in the Spanish Language

    Get PDF
    In recent years, the use of social networks has increased exponentially, which has led to a significant increase in cyberbullying. Currently, in the field of Computer Science, research has been made on how to detect aggressiveness in texts, which is a prelude to detecting cyberbullying. In this field, the main work has been done for English language texts, mainly using Machine Learning (ML) approaches, Lexicon approaches to a lesser extent, and very few works using hybrid approaches. In these, Lexicons and Machine Learning algorithms are used, such as counting the number of bad words in a sentence using a Lexicon of bad words, which serves as an input feature for classification algorithms. This research aims at contributing towards detecting aggressiveness in Spanish language texts by creating different models that combine the Lexicons and ML approach. Twenty-two models that combine techniques and algorithms from both approaches are proposed, and for their application, certain hyperparameters are adjusted in the training datasets of the corpora, to obtain the best results in the test datasets. Three Spanish language corpora are used in the evaluation: Chilean, Mexican, and Chilean-Mexican corpora. The results indicate that hybrid models obtain the best results in the 3 corpora, over implemented models that do not use Lexicons. This shows that by mixing approaches, aggressiveness detection improves. Finally, a web application is developed that gives applicability to each model by classifying tweets, allowing evaluating the performance of models with external corpus and receiving feedback on the prediction of each one for future research. In addition, an API is available that can be integrated into technological tools for parental control, online plugins for writing analysis in social networks, and educational tools, among others
    corecore