13 research outputs found

    <i>VivesDebate</i>:A New Annotated Multilingual Corpus of Argumentation in a Debate Tournament

    Get PDF
    The application of the latest Natural Language Processing breakthroughs in computational argumentation has shown promising results, which have raised the interest in this area of research. However, the available corpora with argumentative annotations are often limited to a very specific purpose or are not of adequate size to take advantage of state-of-the-art deep learning techniques (e.g., deep neural networks). In this paper, we present VivesDebate, a large, richly annotated and versatile professional debate corpus for computational argumentation research. The corpus has been created from 29 transcripts of a debate tournament in Catalan and has been machine-translated into Spanish and English. The annotation contains argumentative propositions, argumentative relations, debate interactions and professional evaluations of the arguments and argumentation. The presented corpus can be useful for research on a heterogeneous set of computational argumentation underlying tasks such as Argument Mining, Argument Analysis, Argument Evaluation or Argument Generation, among others. All this makes VivesDebate a valuable resource for computational argumentation research within the context of massive corpora aimed at Natural Language Processing tasks

    <i>VivesDebate</i>:A New Annotated Multilingual Corpus of Argumentation in a Debate Tournament

    Get PDF
    The application of the latest Natural Language Processing breakthroughs in computational argumentation has shown promising results, which have raised the interest in this area of research. However, the available corpora with argumentative annotations are often limited to a very specific purpose or are not of adequate size to take advantage of state-of-the-art deep learning techniques (e.g., deep neural networks). In this paper, we present VivesDebate, a large, richly annotated and versatile professional debate corpus for computational argumentation research. The corpus has been created from 29 transcripts of a debate tournament in Catalan and has been machine-translated into Spanish and English. The annotation contains argumentative propositions, argumentative relations, debate interactions and professional evaluations of the arguments and argumentation. The presented corpus can be useful for research on a heterogeneous set of computational argumentation underlying tasks such as Argument Mining, Argument Analysis, Argument Evaluation or Argument Generation, among others. All this makes VivesDebate a valuable resource for computational argumentation research within the context of massive corpora aimed at Natural Language Processing tasks

    Focus of negation: Its identification in Spanish

    Full text link
    This article describes the criteria for identifying the focus of negation in Spanish. This work involved an in-depth linguistic analysis of the focus of negation through which we identified some 10 different types of criteria that account for a wide variety of constructions containing negation. These criteria account for all the cases that appear in the NewsCom corpus and were assessed in the annotation of this corpus. The NewsCom corpus consists of 2955 comments posted in response to 18 different news articles from online newspapers. The NewsCom corpus contains 2965 negative structures with their corresponding negation marker, scope, and focus. This is the first corpus annotated with focus in Spanish and it is freely available. It is a valuable resource that can be used both for the training and evaluation of systems that aim to automatically detect the scope and focus of negation and for the linguistic analysis of negation grounded in real data

    Resumen de la tarea de DETESTS en IberLEF 2022: DETEcción y clasificación de eSTereotipos raciales en eSpañol

    Get PDF
    This paper presents an overview of the DETESTS shared task as part of the IberLEF 2022 Workshop on Iberian Languages Evaluation Forum, within the framework of the SEPLN 2022 conference. We proposed two hierarchical subtasks: For subtask 1, participants had to determine the presence of stereotypes in sentences. For subtask 2, participants had to classify the sentences labeled with stereotypes into ten categoriesEste artículo presenta un resumen de la tarea DETESTS como parte del workshop IberLEF 2022, dentro de la conferencia SEPLN 2022. Proponemos dos subtareas jerárquicas: En la subtarea 1, los participantes tuvieron que determinar la presencia de estereotipos raciales en oraciones. En la subtarea 2, de las oraciones etiquetadas con estereotipo, los participantes tuvieron que clasificarlas en una o más de diez categorías. El dataset DETESTS contiene 5.629 oraciones de comentarios que responden a artículos de periódicos sobre inmigración en español. 51 equipos se registraron para participar, de los cuales 39 enviaron predicciones de sistemas y 5 de ellos enviaron artículos. En este artículo presentamos información sobre los datasets de entrenamiento y de prueba, los sistemas utilizados por los participantes, las métricas de evaluación y sus resultados.. The DETESTS dataset contains 5,629 sentences in comments in response to newspaper articles related to immigration in Spanish. 51 teams signed up to participate, of which 39 sent runs, and 5 of them sent their working notes. In this paper, we provide information about the training and test datasets, the systems used by the participants, the evaluation metrics of the systems and their results.Este artículo presenta un resumen de la tarea DETESTS como parte del workshop IberLEF 2022, dentro de la conferencia SEPLN 2022. Proponemos dos subtareas jerárquicas: En la subtarea 1, los participantes tuvieron que determinar la presencia de estereotipos raciales en oraciones. En la subtarea 2, de las oraciones etiquetadas con estereotipo, los participantes tuvieron que clasificarlas en una o más de diez categorías. El dataset DETESTS contiene 5.629 oraciones de comentarios que responden a artículos de periódicos sobre inmigración en español. 51 equipos se registraron para participar, de los cuales 39 enviaron predicciones de sistemas y 5 de ellos enviaron artículos. En este artículo presentamos información sobre los datasets de entrenamiento y de prueba, los sistemas utilizados por los participantes, las métricas de evaluación y sus resultados.This work is supported by the following projects: ‘STERHEOTYPES: STudying European Racial Hoaxes and sterEOTYPES’ funded by Fondazione Compagnia di San Paolo and grant ‘XAIDisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics’ (PLEC2021-007681) funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by the “European Union NextGenerationEU/PRTR”. The work of Paolo Rosso was carried out within the framework of the research project PROMETEO/2019/121 (DeepPattern) by the Generalitat Valenciana

    NoNiRes: A Catalan corpus annotated with negation

    Get PDF
    En este artículo se presentan los criterios aplicados para la anotación de la negación y del foco de la negación del corpus NoNiRes del catalán. El corpus está constituido por 20.600 oraciones procedentes de datasets ya existentes (5.000 oraciones), un foro de Internet (10.000 oraciones) y un periódico digital (5.600 oraciones). Se han tratado aspectos complejos como son el foco y la gradación de la negación. Se ofrecen datos estadísticos exhaustivos sobre las estructuras anotadas.In this article we present the criteria applied for the annotation of negation and focus of negation of the corpus NoNiRes of Catalan. The corpus is composed of 20.600 sentences from existing datasets (5.000 sentences), an Internet forum (10.000 sentences), and a digital newspaper (5.600 sentences). Complex aspects such as the focus and the gradation of negation have been dealt with. Comprehensive statistical data on the annotated structures are provided.Este trabajo ha sido financiado por CLiC, Centre de Llenguatge i Computació, grupo de investigación consolidado por la Generalitat de Catalunya (2021 SGR 00313), y por el Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya, dentro del marco del Projecte AINA

    Negation in Spanish: analysis and typology of negation patterns

    Get PDF
    En este artículo se presentan los criterios aplicados para la anotación del corpus SFU ReviewSP-NEGcon negación y la tipología lingüística correspondiente. Esta tipología presenta la ventaja de ser fácilmente expresable en términos de un tagset para la anotación de corpus, de presentar tipos claramente delimitados, evitando así la ambigüedad en el proceso de anotación, y de presentar una amplia cobertura, es decir, que ha servido para resolver todos los casos que han aparecido. El corpus contiene 400 comentarios y 198.551 palabras. Actualmente está anotado en un 75% y, de un total de 6.331 oraciones revisadas, se han identificado 2.953 estructuras de negación.In this paper we present the criteria applied for the annotation of the SFU ReviewSP-NEGcorpus and the corresponding linguistic typology. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoid the ambiguity in the annotation process, and they present a wide coverage (i.e. they covered/solved all the cases occurring in the corpus). The corpus consists of 400 reviews and 198,551 words. Currently, we have annotated 75% and from a total of 6,331 annotated sentences 2,953 contain at least one negation.Financiado por fondos FEDER, los proyectos: TIN2015-65136-C2-1-R y TIN2015-71147-C2-2 del MINECO y FPU014/00983 del MECD

    VivesDebate: A new annotated multilingual corpus of argumentation in a debate tournament'.

    Get PDF
    The application of the latest Natural Language Processing breakthroughs in computational argumentation has shown promising results, which have raised the interest in this area of research. However, the available corpora with argumentative annotations are often limited to a very specific purpose or are not of adequate size to take advantage of state-of-the-art deep learning techniques (e.g., deep neural networks). In this paper, we present VivesDebate, a large, richly annotated and versatile professional debate corpus for computational argumentation research. The corpus has been created from 29 transcripts of a debate tournament in Catalan and has been machine-translated into Spanish and English. The annotation contains argumentative propositions, argumentative relations, debate interactions and professional evaluations of the arguments and argumentation. The presented corpus can be useful for research on a heterogeneous set of computational argumentation underlying tasks such as Argument Mining, Argument Analysis, Argument Evaluation or Argument Generation, among others. All this makes VivesDebate a valuable resource for computational argumentation research within the context of massive corpora aimed at Natural Language Processing tasks

    La negación en español: análisis y tipología de patrones de negación

    Get PDF
    En este artículo se presentan los criterios aplicados para la anotación del corpus SFU ReviewSP-NEG con negación y la tipología lingüística correspondiente. Esta tipología presenta la ventaja de ser fácilmente expresable en términos de un tagset para la anotación de corpus, de presentar tipos claramente delimitados, evitando así la ambigüedad en el proceso de anotación, y de presentar una amplia cobertura, es decir, que ha servido para resolver todos los casos que han aparecido. El corpus contiene 400 comentar ios y 198.551 palabras. Actualmente está anotado en un 75% y, de un total de 6.331 oraciones revisadas, se han identificado 2.953 estructuras de negación.Palabras clave: Negación, anotación de corpus, tipos de negación, análisis de opiniones, anotación de la polaridad

    Data Visualization for Supporting Linguists in the Analysis of Toxic Messages

    Get PDF
    The goal of this research is to provide linguists with visualisations for analysing the results of their hate speechannotation. These visualisations consist of a set of interactive graphs for analysing the global distribution ofannotated messages, finding relationships between features, and detecting inconsistencies in the annotation.We used a corpus that includes 1,262 comments posted in response to different Spanish online new articles.The comments were annotated with features such as sarcasm, mockery, insult, improper language, construc-tivity and argumentation, as well as with level of toxicity (’not-toxic’, ’mildly toxic’, ’toxic’ or ’very toxic’).We evaluated the selected visualisations with users to assess the graphs’ comprehensibility, interpretabilityand attractiveness. One of the lessons learned from the study is the usefulness of mixed visualisations that in-clude simple graphs (Bar, Heat map) - to facilitate the familiarisation with the results of the annotated corpustogether with more complex ones (Sankey, Spider or Chord) - to explore and identify relationships betweenfeatures and to find inconsistencies

    Visión general de DETOXIS en IberLEF 2021: DEtección de TOXicidad en comentarios En Español

    No full text
    In this paper we present the DETOXIS task, DEtection of TOxicity in comments In Spanish, which took place as part of the IberLEF 2021 Workshop on Iberian Languages Evaluation Forum at the SEPLN 2021 Conference. We describe the NewsCom-TOX dataset used for training and testing the systems, the metrics applied for their evaluation and the results obtained by the submitted approaches. We also provide an error analysis of the results of these systems.En este artículo se presenta la tarea DETOXIS, DEtección de TOxicidad en comentarios en español, que tuvo lugar en el Iberian Languages Evaluation Forum workshop (IberLEF 2021) en el congreso de la SEPLN 2021. Se describe el corpus NewsCom-TOX utilizado para entrenar y evaluar los sistemas, las métricas para evaluarlos y los resultados obtenidos por las distintas aproximaciones utilizadas. Se proporciona también un análisis de los resultados obtenidos por estos sistemas.The work has been carried out in the framework of the following projects: MISMIS project (PGC2018-096212-B), funded by Ministerio de Ciencia, Innovación y Universidades (Spain), CLiC SGR (2027SGR341), funded by AGAUR (Generalitat de Catalunya) and STERHEOTYPES project (Challenges for Europe), funded by Fondazione Compangia di San Paolo