1,176 research outputs found

    Adapting the TANL tool suite to Universal Dependencies

    Get PDF
    TANL is a suite of tools for text analytics based on the software architecture paradigm of data driven pipelines. The strategies for upgrading TANL to the use of Universal Dependencies range from a minimalistic approach consisting of introducing pre/post-processing steps into the native pipeline to revising the whole pipeline. We explore the issue in the context of the Italian Treebank, considering both the efforts involved, how to avoid losing linguistically relevant information and the loss of accuracy in the process

    Bootstrapping enhanced universal dependencies for Italian

    Get PDF
    The paper presents an extension of the Italian Universal Dependencies Treebank with an "enhanced" representation level (e-IUDT), aimed at simplifying the information extraction process. The modules developed to semi-automatically build e-IUDT were delexicalized to perform cross-language enhancements: preliminary experiments in this direction led to promising results

    Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies

    Get PDF
    Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to a reduced dependency tag set. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported

    Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

    Get PDF
    The paper addresses the challenge of converting MIDT, an existing dependencybased Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging to the same dependency–based family, the Italian Stanford Dependency Treebank (ISDT), and an Italian localization of the Stanford Dependency scheme

    Harmonization and Merging of two Italian Dependency Treebanks

    Get PDF
    The paper describes the methodology which is currently being defined for the construction of a "Merged Italian Dependency Treebank'' (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST--TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a "bridge'' between the specific schemes. As an encoding format, the CoNLL de facto standard is used

    Evolution of Italian Treebank and Dependency Parsing towards Universal Dependencies

    Get PDF
    Illustriamo i principali cambiamenti effettuati sulla treebank a dipendenze per l’italiano nel passaggio a una versione estesa e rivista secondo lo stile di annotazione delle Universal Dependencies. Esploriamo come questi cambiamenti influenzano l’accuratezza dei parser a dipendenze, eseguendo test comparativi su diverse versioni della treebank. Nonostante i cambiamenti rilevanti nello stile di annotazione, i parser statistici sono in grado di adeguarsi e migliorare in accuratezza.We highlight the main changes recently undergone by the Italian De-pendency Treebank in the transition to an extended and revised edition, compliant with the annotation schema of Universal Dependencies. We explore how these changes affect the accuracy of dependen-cy parsers, performing comparative tests on various versions of the treebank. De-spite significant changes in the annota-tion style, statistical parsers seem to cope well and mostly improve

    Using Embeddings for Both Entity Recognition and Linking in Tweets

    Get PDF
    L’articolo descrive la nostra partecipazione al task di Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) a Evalita 2016. Il nostro approccio si basa sull’utilizzo di un Named Entity tagger che sfrutta embeddings sia character-level che word-level. I primi consentono di apprendere le idiosincrasie della scrittura nei tweet. L’uso di un tagger completo consente di riconoscere uno spettro più ampio di entità rispetto a quelle conosciute per la loro presenza in Knowledge Base o gazetteer. Le prove sottomesse hanno ottenuto il primo, secondo e quarto dei punteggi ufficiali.The paper describes our sub-missions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both charac-ter-level and word-level embeddings. Character-based embeddings allow learn-ing the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger allows recognizing a wider range of entities than those well known by their presence in a Knowledge Base or gazetteer. Our submissions achieved first, second and fourth top offi-cial scores

    The Evalita 2014 Dependency Parsing task

    Get PDF
    SUMMARY. The Parsing Task is among the “historical” tasks of Evalita, and in all editions its main objective has been to define and improve state-of-the-art technologies for parsing Italian. The 2014’s edition of the shared task features several novelties that have mainly to do with the data set and the subtasks. The paper therefore focuses on these two strictly interrelated aspects and presents an overview of the participants systems and results. RIASSUNTO. Il “Parsing Task”, tra i compiti storici di Evalita, in tutte le edizioni ha avuto lo scopo principale di definire ed estendere lo stato dell’arte per l’analisi sin- tattica automatica della lingua italiana. Nell’edizione del 2014 della campagna di valutazione esso si caratterizza per alcune significative novità legate in particolare ai dati utilizzati per l’addestramento e alla sua organizzazione interna. L’articolo si focalizza pertanto su questi due aspetti strettamente interrelati e presenta una panoramica dei sistemi che hanno partecipato e dei risultati raggiunti

    Becoming JILDA

    Get PDF
    The difficulty in finding use-ful dialogic data to train a conversationalagent is an open issue even nowadays,when chatbots and spoken dialogue sys-tems are widely used. For this reason wedecided to build JILDA, a novel data col-lection of chat-based dialogues, producedby Italian native speakers and related to thejob-offer domain. JILDA is the first dia-logue collection related to this domain forthe Italian language. Because of its collec-tion modalities, we believe that JILDA canbe a useful resource not only for the Italianresearch community, but also for the inter-national one

    Educational ecosystems for Information Science: the case of the University of Pisa

    Get PDF
    Interdisciplinarity is becoming increasingly important in education. With the rapidly evolving job market, an interdisciplinary education can prepare students for the flexibility and broad knowledge base required to adapt. At the University of Pisa, we recognized the value of an interdisciplinary educational environment during our participation in the European project EINFOSE, where we harmonized the entry requirements for master programs in Information Science. Prior to this project, we had been building study programs in Digital Humanities and Data Science, whose intersection organically nurtured a diverse learning space. Through this lens, we will reflect on the obstacles constituted by disciplinary barriers and stress the importance of a flexible and open ‘ecosystem’ for education. These conclusions will be supported by data analysis on the careers of our students over the last eight years
    corecore