134 research outputs found

    Is EVALITA Done? On the Impact of Prompting on the Italian NLP Evaluation Campaign

    Get PDF

    TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony

    Get PDF
    Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony. The project mainly consists in the application on the Twitter corpus TWITTIRĂ’ of a multi-layered scheme for the fine-grained annotation of irony, as proposed in a multilingual setting and previously applied also on French and English datasets (Karoui et al. 2017). In applying the annotation on this corpus, we outline and discuss the issues and peculiarities emerged about the exploitation of the semantic scheme for Twitter textual messages in Italian, thus shedding some lights on the future directions that can be followed in the multilingual and cross-language perspective too. We present, in particular, an analysis of the annotation process and distribution of the labels of each layer involved in the scheme. This is supported by a discussion of the outcome of the annotation carried on by native Italian speakers in the development of the corpus. In particular, an in-depth discussion of the inter-annotator agreement and of the sources of disagreement is included. The result is a novel gold standard corpus for irony detection in Italian, which enriches the scenario of multilingual datasets available for this challenging task and is ready to be used as a benchmark in automatic irony detection experiments and evaluation campaigns

    Lessons Learned from EVALITA 2020 and Thirteen Years of Evaluation of Italian Language Technology

    Get PDF
    This paper provides a summary of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA2020) which was held online on December 17th, due to the 2020 COVID-19 pandemic. The 2020 edition of Evalita included 14 different tasks belonging to five research areas, namely: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, (v) Time and Diachrony. This paper provides a description of the tasks and the key findings from the analysis of participant outcomes. Moreover, it provides a detailed analysis of the participants and task organizers which demonstrates the growing interest with respect to this campaign. Finally, a detailed analysis of the evaluation of tasks across the past seven editions is provided; this allows to assess how the research carried out by the Italian community dealing with Computational Linguistics has evolved in terms of popular tasks and paradigms during the last 13 years

    Long-term Social Media Data Collection at the University of Turin

    Get PDF
    We report on the collection of social media messages — from Twitter in particular — in the Italian language that is continuously going on since 2012 at the University of Turin. A number of smaller datasets have been extracted from the main collection and enriched with different kinds of annotations for linguistic purposes. Moreover, a few extra datasets have been collected independently and are now in the process of being merged with the main collection. We aim at making the resource available to the community to the best of our possibility, in accordance with the Terms of Service provided by the platforms where data have been gathered from.In questo articolo descriviamo il lavoro di raccolta di messaggi — da Twitter in particolar modo—in lingua italiana che va avanti in maniera continuativa dal 2012 presso l’Università di Torino. Diversi dataset sono stati estratti dalla raccolta principale ed arricchiti con differenti tipi di annotazione per scopi linguistici. Inoltre, dataset ulteriori sono stati raccolti indipendentemente, e fanno ora parte della raccolta principale. Il nostro scopo è rendere questa risorsa disponibile alla comunit` a in maniera pi`u completa possibile, considerati i termini d’uso imposti dalle piattaforme da cui i dati sono stati estratti

    Lexical Opposition in Discourse Contrast

    Get PDF
    We investigate the connection between lexical opposition and discourse relations, with a focus on the relation of contrast, in order to evaluate whether opposition participates in discourse relations. Through a corpus-based analysis of Italian documents, we show that the relation between opposition and contrast is not crucial, although not insignificant in the case of implicit relation. The correlation is even weaker when other discourse relations are taken into account.Studiamo la connessione tra l’opposizione lessicale e le relazioni del discorso, con attenzione alla relazione di contrasto, per verificare se l’opposizione partecipa alle relazioni del discorso. Attraverso un’analisi basata su un corpus di documenti in italiano, mostriamo che la relazione tra opposizione e contrasto non è cruciale, anche se non priva di importanza soprattutto per i casi di contrasto implicito. La correlazione sembra più debole se consideriamo le altre relazioni del discorso

    Auxiliary selection in Italian intransitive verbs: a computational investigation based on annotated corpora

    Get PDF
    The purpose of this paper is the analysis of the auxiliary selection in intransitive verbs in Italian. The applied methodology consists in comparing the linguistic theory with the data extracted from two different annotated corpora: UD-IT and PoSTWITA-UD. The analyzed verbs have been classified in different semantic categories depending on the linguistic theory. The results confirm the theoretical assumptions and they could be considered as a starting point for many applicative tasks as Natural Language Generation.Obiettivo di questo lavoro è l’analisi della selezione dell’ausiliare dei verbi intransitivi in italiano. La metodologia applicata consiste nel confrontare la teoria linguistica con dati estratti da due corpora annotati: UD-IT e PoSTWITAUD. I verbi analizzati sono stati classificati nelle categorie semantiche individuate partendo dalla letteratura teorica. I risultati confermano con buona approssimazione gli assunti teorici e possono quindi essere il punto di partenza per l’implementazione di strumenti come sistemi di Natural Language Generation
    • …
    corecore