17 research outputs found

    L'annotazione dell'aspetto verbale per il trattamento automatico della lingua italiana: esperimenti e valutazioni

    Get PDF
    Il riconoscimento e l’annotazione dell’aspetto e dell’azione verbale si configurano come task complessi, in cui interagiscono allo stesso tempo vari livelli di elaborazione linguistica. Proprio a causa della difficoltà di risoluzione dei problemi che riguardano il dominio tempo-aspettuale, l’indagine svolta in questo lavoro si è sviluppata secondo un processo gerarchicamente ordinato in livelli di difficoltà crescente. Tenendo presente che l’orizzonte di riferimento è costituito dalla creazione di risorse annotate per sistemi di Natural Language Processing (NLP), si è cercato, innanzitutto, di rispondere a tre domande riguardanti lo stato dell’arte: 1. Esistono schemi di annotazione che contemplano l’etichettatura di azione e aspetto verbale? 2. Se e quali sono le risorse annotate attualmente disponibili che riportano questi tipi di informazione? 3. Quali e quanti sono i sistemi automatici disponibili utilizzati per la creazione di risorse linguistiche di questo tipo o che sfruttano questi dati per il proprio l’addestramento? Un’attenzione particolare è stata rivolta all’analisi dello schema di annotazione TimeML (Pustejovsky et al., 2003), per l’annotazione di eventi, espressioni temporali e delle loro relazioni, nell’ottica di indagare se e in quale misura vengono proposte metodologie per l’annotazione di aspettualità e azionalità e se queste sono sufficientemente esaurienti ed esaustive. Il lavoro svolto in questa tesi si giustifica proprio alla luce dei risultati ottenuti da quest’analisi. È risultato evidente, infatti, come l’annotazione aspettuale non sia molto diffusa nei corpora, con particolare riferimento alla marcatura dell’aspetto abituale. Tale mancanza è spesso giustificata con l’elevata difficoltà nel distinguere le varie tipologie di aspetto verbale: proprio dalla verifica di questa asserzione prende avvio l’esperimento svolto in questa tesi e l’idea di utilizzare la piattaforma di crowdsoucing CrowdFlower per testare la capacità di utenti non linguisticamente addestrati di individuare una particolare tipologia aspettuale, ovvero l’aspetto afferente alla classe dell’imperfettività gnomica (Bertinetto e Lenci, 2011). La decisione di marcare l’imperfettività gnomica (aspetti abituale, attitudinale, potenziale, generico e Individual Level predicate) è stata dettata dall’interesse per l’individuazione, all’interno dei testi, di frasi che esprimono una generalizzazione di qualche tipo oppure una proprietà che caratterizza un soggetto per un intero periodo della sua vita o per tutta la sua esistenza. Si è cercato, quindi, di capire se possa essere utile/possibile marcare questa particolare classe aspettuale in prospettiva di un’estrazione automatica di informazione di senso comune (Singh, 2002) da testi scritti

    EVALITA 2009: Description and Results of the Local Entity Detection and Recognition (LEDR) task.

    Get PDF
    In this paper, we describe motivations and features of the LEDR (Local Entity Detection and Recognition) task at EVALITA 2009. Our work refers to the task of the same name within the Automatic Content Extraction (ACE) program. We adopted the ACE annotation scheme adapting it to the specific morpho-syntactic features of Italian in order to create training and test data to be used in the evaluation of Information Extraction systems for Italian. In this report annotated data and evaluation measures are presented. Moreover, the results obtained by the participating system are showed

    Evaluation of Natural Language Tools for Italian: EVALITA 2007

    Get PDF
    EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants? systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages

    Evaluation of Natural Language Tools for Italian: EVALITA 2007

    Get PDF
    EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants\u2019 systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages

    AN ONTOLOGY FOR NARRATIVES

    No full text
    One of the main problems of the current Digital Libraries (DLs) is the limitation of the informative services offered to the user who aims at discovering the resources of the DL by queries in natural language. Indeed, all DLs provide simple search functionalities that return a ranked list of their resources. No semantic relation among the returned objects is usually reported that could help the user to obtain a more complete knowledge on the subject of the search. The introduction of the Semantic Web, and in particular of the Linked Data, has the potential of improving the search functionalities of DLs. In this context, the long-term aim of this thesis has been to introduce the narrative as new first-class search functionality. As output of a query, the envisaged new search functionality should not only return a list of objects but it should also present one or more narratives, composed of events that are linked to the objects of the existing libraries (e.g. Europeana) and are endowed with a set of semantic relations connecting these events into a meaningful semantic network. As a necessary step towards this direction, the thesis presents an ontology for representing narratives, along with a tool for the construction of narratives based on the ontology. Moreover, it has used to the tool for evaluating the ontology in the context of an experiment centred on the biography of the Italian poet Dante Alighieri, the major Italian poet of the late Middle Ages. More specifically: - An overview of the related works developed in the Semantic Web field and in Narratology, and especially in its sub branch named Computational Narratology was reported. The basic principles of Narratology and Computational Narratology have been reviewed along with the study of the Artificial Intelligence literature, especially of the Event Calculus theory, in order to identify the formal components of narratives. - A conceptualization of narratives has been developed, based on notions derived from narratology and Artificial Intelligence. According to this conceptualization, a narrative consists of a fabula, i.e. the events of a story in chronologically ordered, and several narrations of this fabula (plots), linked to the fabula by an event association relation. A mathematical expression of the conceptualization has been given, in order to provide a characterization of the conceptualization as clear and as precise as possible, also to be used as a basis for the subsequent development of an ontology of narratives, encoded in OWL. The proposed conceptualization has been validated by expressing it into an existing ontology, the CIDOC CRM, and by endowing it with provenance knowledge, also expressed in a derivation of the CRM, named CRMinf. This expression has been used in the validation experiment, consisting in the modelling a narrative of the biography of Dante Alighieri, provided by a biographer who has scientifically supported this research. - The population of the created ontology has been performed by means of a semiautomatic approach implemented by a tool for the construction of narratives which obey the ontology. This tool retrieves and assigns URIs to the instances of the classes of the ontology using Wikidata as external resource and also facilitates the construction and contextualization of events, and their linking to form the fabulae of narratives. - Finally, a qualitative validation of the developed ontology has been carried out. This validation has regarded the evaluation of: (i) the representational adequacy of the ontology by a Dante Alighieri’s expert; (ii) the effectiveness of the narrative building tool; (iii) the satisfaction of the users’ requirements defined at the beginning of the study. To prove the last point, initial requirements representing pre-requisites of this work have been satisfied by demonstrating that a SPARQL query can be always built to extract the requested information from the knowledge base embodying the narrative

    CAT: the CELCT Annotation Tool

    No full text
    This paper presents CAT - CELCT Annotation Tool, a new general-purpose web-based tool for text annotation developed by CELCT (Center for the Evaluation of Language and Communication Technologies). The aim of CAT is to make text annotation an intuitive, easy and fast process. In particular, CAT was created to support human annotators in performing linguistic and semantic text annotation and was designed to improve productivity and reduce time spent on this task. Manual text annotation is, in fact, a time-consuming activity, and conflicts may arise with the strict deadlines annotation projects are frequently subject to. Thanks to its adaptability and user-friendly interface, CAT can positively contribute to improve time management in annotation project. Further, the tool has a number of features which make it an easy-to-use tool for many types of annotations. Even if the first prototype of CAT has been used to perform temporal and event annotation following the It-TimeML specifications, the tool is general enough to be used for annotating a broad range of linguistic and semantic phenomena. CAT is freely available for research purposes

    Evalita 2007: Description and results of the TERN task

    No full text
    In this paper, we describe motivations and features of the TERN (Temporal Expression Recognition and Normalization) task at EVALITA 2007. We also present the training and test data used in this task, evaluation measures and participants\u2018results
    corecore