17 research outputs found

    Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

    Full text link
    Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.Comment: To appear at CS4OA workshop (INLG-SIGDial

    Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

    Get PDF
    Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce

    Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

    Full text link
    In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful `out of target' experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i.e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.Comment: To appear in "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL): Findings

    Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering

    Full text link
    Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize counter narrative writing. However, the existing resources necessary to train NLG models are limited to 2-turn interactions (a hate speech and a counter narrative as response), while in real life, interactions can consist of multiple turns. In this paper, we present a hybrid approach for dialogical data collection, which combines the intervention of human expert annotators over machine generated dialogues obtained using 19 different configurations. The result of this work is DIALOCONAN, the first dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate.Comment: To appear in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (long paper

    Fidedignidade de rótulos de alimentos comercializados no município de São Paulo, SP

    Get PDF
    OBJECTIVE: To assess reliability of information about nutritional facts stated on labels of foods marketed. METHODS: A total of 153 industrialized foods, usually consumed by children and adolescents and marketed in the city of São Paulo, Southeastern Brazil, between 2001 and 2005, were analyzed. Nutrient contents stated on labels were compared to the results obtained from official (physical-chemical) analytical methods, considering the 20% variability tolerated by the current legislation to approve or reject samples. Means, standard deviations, 95% confidence intervals for the nutrients analyzed, and the distribution of percentage frequency of samples rejected were calculated. RESULTS: All salty products analyzed showed non-compliance of dietary fiber, sodium and saturated fat content. Sweet products showed variation between zero and 36% of rejection due to their dietary fiber content. More than half (52%) of cookies were rejected due to their saturated fat content. Nutrients associated with obesity and its health problems were those showing the highest proportions of non-compliance. Lack of reliability of label information in the samples analyzed violates the regulations of the Resolution of the Collegiate Board of Directors RDC 360/03 and the rights guaranteed by the Nutritional and Food Safety Law and Consumer Protection Code. CONCLUSIONS: High indices of non-compliance of nutritional data were found on labels of foods aimed at children and adolescents, indicating the urgent need for surveillance practices and other nutritional labeling measures.OBJETIVO: Evaluar la fidedignidad de las informaciones sobre datos nutricionales declarados en rótulos de alimentos comercializados. MÉTODOS: Se evaluaron 153 alimentos industrializados habitualmente consumidos por niños y adolescentes, comercializados en el municipio de Sao Paulo (Sureste de Brasil) entre los años de 2001 y 2005. Los tenores de nutrientes informados por los rótulos fueron confrontados con los resultados obtenidos por métodos analíticos (fisico-químicos) oficiales, considerando la variabilidad de 20% tolerada por la legislación vigente, para aprobar o condenar las muestras. Se calcularon promedios, desvíos estándar e intervalos con 95% de confianza para los nutrientes analizados, así como la distribución de la frecuencia porcentual de muestras condenadas. RESULTADOS: Todos los productos salados analizados presentaron inconformidades relativas al contenido de fibra alimentar, sodio o de grasas saturadas. Los productos dulces presentaron variación de cero a 36% de condenación relativa al tenor de la fibra alimentar. Más de la mitad (52%) de los biscochos rellenos fueron condenados con relación a la cantidad de grasas saturadas. Los nutrientes implicados con la obesidad y sus complicaciones para la salud fueron aquellos que presentaron mayores proporciones de inconformidad. La falta de fidedignidad de las informaciones de rótulos en las muestras analizadas viola las disposiciones de la Resolución de la Directoria Colegiada Anvisa 360/03 y los derechos garantizados por la ley de Seguridad Alimentar y Nutricional y por el Código de Defensa del Consumidor. CONCLUSIONES: Se encontraron altos índices de no conformidad de los datos nutricionales en los rótulos de alimentos destinados al público adolescente e infantil, indicando la urgencia de acciones de fiscalización y de otras medidas de rotulación nutricional.OBJETIVO: Avaliar a fidedignidade das informações sobre dados nutricionais declarados em rótulos de alimentos comercializados. MÉTODOS: Foram avaliados 153 alimentos industrializados habitualmente consumidos por crianças e adolescentes, comercializados no município de São Paulo (SP) entre os anos de 2001 e 2005. Os teores de nutrientes informados pelos rótulos foram confrontados com os resultados obtidos por métodos analíticos (físico-químicos) oficiais, considerando a variabilidade de 20% tolerada pela legislação vigente, para aprovar ou condenar as amostras. Foram calculadas médias, desvios-padrão e intervalos com 95% de confiança para os nutrientes analisados, assim como a distribuição da freqüência percentual de amostras condenadas. RESULTADOS: Todos os produtos salgados analisados apresentaram inconformidades relativamente ao conteúdo de fibra alimentar, sódio ou de gorduras saturadas. Os produtos doces apresentaram variação de zero a 36% de condenação relativamente ao teor de fibra alimentar. Mais da metade (52%) dos biscoitos recheados foram condenados quanto à quantidade de gorduras saturadas. Os nutrientes implicados com a obesidade e suas complicações para a saúde foram aqueles que apresentaram maiores proporções de inconformidade. A falta de fidedignidade das informações de rótulos nas amostras analisadas viola as disposições da Resolução da Diretoria Colegiada 360/03 da ANVISA e os direitos garantidos pela lei de Segurança Alimentar e Nutricional e pelo Código de Defesa do Consumidor. CONCLUSÕES: Foram encontrados altos índices de não conformidade dos dados nutricionais nos rótulos de alimentos destinados ao público adolescente e infantil, indicando a urgência de ações de fiscalização e de outras medidas de rotulagem nutricional

    Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering

    No full text
    Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize counter narrative writing. However, the existing resources necessary to train NLG models are limited to 2-turn interactions (a hate speech and a counter narrative as response), while in real life, interactions can consist of multiple turns. In this paper, we present a hybrid approach for dialogical data collection, which combines the intervention of human expert annotators over machine generated dialogues obtained using 19 different configurations. The result of this work is DIALOCONAN, the first dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate

    Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

    No full text
    Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including diverse dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community

    Fidedignidade de rótulos de alimentos comercializados no município de São Paulo, SP Fidedignidad de rótulos de alimentos comercializados en municipio de Sao Paulo, Sureste de Brasil Reliability of food labels from products marketed in the city of São Paulo, Southeastern Brazil

    Get PDF
    OBJETIVO: Avaliar a fidedignidade das informações sobre dados nutricionais declarados em rótulos de alimentos comercializados. MÉTODOS: Foram avaliados 153 alimentos industrializados habitualmente consumidos por crianças e adolescentes, comercializados no município de São Paulo (SP) entre os anos de 2001 e 2005. Os teores de nutrientes informados pelos rótulos foram confrontados com os resultados obtidos por métodos analíticos (físico-químicos) oficiais, considerando a variabilidade de 20% tolerada pela legislação vigente, para aprovar ou condenar as amostras. Foram calculadas médias, desvios-padrão e intervalos com 95% de confiança para os nutrientes analisados, assim como a distribuição da freqüência percentual de amostras condenadas. RESULTADOS: Todos os produtos salgados analisados apresentaram inconformidades relativamente ao conteúdo de fibra alimentar, sódio ou de gorduras saturadas. Os produtos doces apresentaram variação de zero a 36% de condenação relativamente ao teor de fibra alimentar. Mais da metade (52%) dos biscoitos recheados foram condenados quanto à quantidade de gorduras saturadas. Os nutrientes implicados com a obesidade e suas complicações para a saúde foram aqueles que apresentaram maiores proporções de inconformidade. A falta de fidedignidade das informações de rótulos nas amostras analisadas viola as disposições da Resolução da Diretoria Colegiada 360/03 da ANVISA e os direitos garantidos pela lei de Segurança Alimentar e Nutricional e pelo Código de Defesa do Consumidor. CONCLUSÕES: Foram encontrados altos índices de não conformidade dos dados nutricionais nos rótulos de alimentos destinados ao público adolescente e infantil, indicando a urgência de ações de fiscalização e de outras medidas de rotulagem nutricional.<br>OBJETIVO: Evaluar la fidedignidad de las informaciones sobre datos nutricionales declarados en rótulos de alimentos comercializados. MÉTODOS: Se evaluaron 153 alimentos industrializados habitualmente consumidos por niños y adolescentes, comercializados en el municipio de Sao Paulo (Sureste de Brasil) entre los años de 2001 y 2005. Los tenores de nutrientes informados por los rótulos fueron confrontados con los resultados obtenidos por métodos analíticos (fisico-químicos) oficiales, considerando la variabilidad de 20% tolerada por la legislación vigente, para aprobar o condenar las muestras. Se calcularon promedios, desvíos estándar e intervalos con 95% de confianza para los nutrientes analizados, así como la distribución de la frecuencia porcentual de muestras condenadas. RESULTADOS: Todos los productos salados analizados presentaron inconformidades relativas al contenido de fibra alimentar, sodio o de grasas saturadas. Los productos dulces presentaron variación de cero a 36% de condenación relativa al tenor de la fibra alimentar. Más de la mitad (52%) de los biscochos rellenos fueron condenados con relación a la cantidad de grasas saturadas. Los nutrientes implicados con la obesidad y sus complicaciones para la salud fueron aquellos que presentaron mayores proporciones de inconformidad. La falta de fidedignidad de las informaciones de rótulos en las muestras analizadas viola las disposiciones de la Resolución de la Directoria Colegiada Anvisa 360/03 y los derechos garantizados por la ley de Seguridad Alimentar y Nutricional y por el Código de Defensa del Consumidor. CONCLUSIONES: Se encontraron altos índices de no conformidad de los datos nutricionales en los rótulos de alimentos destinados al público adolescente e infantil, indicando la urgencia de acciones de fiscalización y de otras medidas de rotulación nutricional.<br>OBJECTIVE: To assess reliability of information about nutritional facts stated on labels of foods marketed. METHODS: A total of 153 industrialized foods, usually consumed by children and adolescents and marketed in the city of São Paulo, Southeastern Brazil, between 2001 and 2005, were analyzed. Nutrient contents stated on labels were compared to the results obtained from official (physical-chemical) analytical methods, considering the 20% variability tolerated by the current legislation to approve or reject samples. Means, standard deviations, 95% confidence intervals for the nutrients analyzed, and the distribution of percentage frequency of samples rejected were calculated. RESULTS: All salty products analyzed showed non-compliance of dietary fiber, sodium and saturated fat content. Sweet products showed variation between zero and 36% of rejection due to their dietary fiber content. More than half (52%) of cookies were rejected due to their saturated fat content. Nutrients associated with obesity and its health problems were those showing the highest proportions of non-compliance. Lack of reliability of label information in the samples analyzed violates the regulations of the Resolution of the Collegiate Board of Directors RDC 360/03 and the rights guaranteed by the Nutritional and Food Safety Law and Consumer Protection Code. CONCLUSIONS: High indices of non-compliance of nutritional data were found on labels of foods aimed at children and adolescents, indicating the urgent need for surveillance practices and other nutritional labeling measures

    Mining Novel Candidate Imprinted Genes Using Genome-Wide Methylation Screening and Literature Review

    No full text
    Large-scale transcriptome and methylome data analyses obtained by high-throughput technologies have been enabling the identification of novel imprinted genes. We investigated genome-wide DNA methylation patterns in multiple human tissues, using a high-resolution microarray to uncover hemimethylated CpGs located in promoters overlapping CpG islands, aiming to identify novel candidate imprinted genes. Using our approach, we recovered ~30% of the known human imprinted genes, and a further 168 candidates were identified, 61 of which with at least three hemimethylated CpGs shared by more than two tissue types. Thirty-four of these candidate genes are members of the protocadherin cluster on 5q31.3; in mice, protocadherin genes have non-imprinted random monoallelic expression, which might also be the case in humans. Among the remaining 27 genes, ZNF331 was recently validated as an imprinted gene, and six of them have been reported as candidates, supporting our prediction. Five candidates (CCDC166, ARC, PLEC, TONSL, and VPS28) map to 8q24.3, and might constitute a novel imprinted cluster. Additionally, we performed a comprehensive compilation of known human and mice imprinted genes from literature and databases, and a comparison among high-throughput imprinting studies in humans. The screening for hemimethylated CpGs shared by multiple human tissues, together with the extensive review, appears to be a useful approach to reveal candidate imprinted genes
    corecore