    Cross-Platform Evaluation for Italian Hate Speech Detection

    International audienceEnglish. Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks , the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and What-sApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration. 1 Italiano. Nonostante si osservi un cre-scente interesse per approcci che identi-fichino il linguaggio offensivo sui social network attraverso l'NLP, la necessità di sviluppare sistemi che mantengano una buona performance anche su piattaforme diverseè ancora un tema di ricerca aper-to. In questo contributo presentiamo una valutazione comparativa su dataset per l'identificazione di linguaggio d'odio pro-venienti da quattro diverse piattaforme: Facebook, Twitter, Instagram and Wha-tsApp. Lo studio dimostra che, combinan-do dataset diversi per aumentare i dati di training, migliora le performance di clas-sificazione, anche se l'impatto varia a se-conda della piattaforma considerata.

    Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA)

    IronITA is a new shared task in the EVALITA 2018 evaluation campaign, focused on the automatic classification of irony in Italian texts from Twitter. It includes two tasks: 1) irony detection and 2) detection of different types of irony, with a special focus on sarcasm identification. We received 17 submissions for the first task and 7 submissions for the second task from 7 teams.IronITA è un nuovo esercizio di valutazione della campagna di valutazione EVALITA 2018, specificamente dedicato alla classificazione automatica dell’ironia presente in testi estratti da Twitter. Comprende due task: 1) riconoscimento dell’ironia e 2) riconoscimento di diversi tipi di ironia, con particolare attenzione all’identificazione del sarcasmo. Abbiamo ricevuto 17 sottomissioni per il primo task e 7 per il secondo, da parte di 7 gruppi partecipanti

    COVID-19 Outbreak through Tweeters\u2019 Words: Monitoring Italian Social Media Communication about COVID-19 with Text Mining and Word Embeddings

    In this paper we aim to analyze the Italian social media communication about COVID-19 through a Twitter dataset collected in two months. The text corpus had been studied in terms of sensitivity to the social changes that are affecting people's lives in this crisis. In addition, the results of a sentiment analysis performed by two lexicons were compared and word embedding vectors were created from the available plain texts. Following we tested the informative effectiveness of word embeddings and compared them to a bag-of-words approach in terms of text classification accuracy. First results showed a certain potential of these textual data in the description of the different phases of the outbreak. However, a different strategy is needed for a more reliable sentiment labeling, as the results proposed by the two lexicons were discordant. Finally, although presenting interesting results in terms of semantic similarity, word embeddings did not show a predictive ability higher than the frequency vectors of the terms

    ItaliaNLP @ TAG-IT: UmBERTo for Author Profiling at TAG-it 2020

    In this paper we describe the systems we used to participate in the task TAG-it of EVALITA 2020. The first system we developed uses linear Support Vector Machine as learning algorithm. The other two systems are based on the pretrained Italian Language Model UmBERTo: one of them has been developed following the Multi-Task Learning approach, while the other following the Single-Task Learning approach. These systems have been evaluated on TAG-it official test sets and ranked first in all the TAG-it subtasks, demonstrating the validity of the approaches we followed

    Svandiela @ HaSpeeDe: Detecting Hate Speech in Italian Twitter Data with BERT

    This paper explains the system developed for the Hate Speech Detection (HaSpeeDe) shared task within the 7th evaluation campaign EVALITA 2020 (Basile et al. 2020). The task solution proposed in this work is based on a fine-tuned BERT model. In cross-corpus evaluation, our model reached an F1 score of 77,56% on the tweets test set, and 60,31% on the news headlines test set.Questo articolo spiega il sistema sviluppato per il tesk finalizzato all’individuazione dei discorsi d’odio all’interno della campagna di valutazione EVALITA 2020 (Basile et al. 2020). La soluzione proposta per il task è basata su un raffinemento di un modello BERT. Nella valutazione finale il nostro modello raggiunge un valore F1 di 77,56% sul dataset di tweets e di 60,31% sul dataset di titoli di giornale

    DH-FBK @ HaSpeeDe2: Italian Hate Speech Detection via Self-Training and Oversampling

    We describe in this paper the system submitted by the DH-FBK team to the HaSpeeDe evaluation task, and dealing with Italian hate speech detection (Task A). While we adopt a standard approach for fine-tuning AlBERTo, the Italian BERT model trained on tweets, we propose to improve the final classification performance by two additional steps, i.e. self-training and oversampling. Indeed, we extend the initial training data with additional silver data, carefully sampled from domain-specific tweets and obtained after first training our system only with the task training data. Then, we re-train the classifier by merging silver and task training data but oversampling the latter, so that the obtained model is more robust to possible inconsistencies in the silver data. With this configuration, we obtain a macro-averaged F1 of 0.753 on tweets, and 0.702 on news headlines

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)