69 research outputs found

    Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain

    Get PDF
    In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parliamentary speeches. The corpus is freely available and includes several annotation layers, i.e. key-concepts, lemmas, PoS tags, person names and geo-referenced places, representing a high-quality ‘silver’ annotation. We believe that this resource can foster research in historical corpus analysis, stylometry and computational social science, among others

    Tint, the Swiss-Army Tool for Natural Language Processing in Italian

    Get PDF
    In this we paper present the last version of Tint, an opensource, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP. The new release includes a set of text processing components for fine-grained linguistic analysis, from tokenization to relation extraction, including part-of-speech tagging, morphological analysis, lemmatization, multi-word expression recognition, dependency parsing, named-entity recognition, keyword extraction, and much more. Tint is written in Java freely distributed under the GPL license. Although some modules do not perform at a state-of-the-art level, Tint reaches very good accuracy in all modules, and can be easily used out-of-the-box

    Tint 2.0: an All-inclusive Suite for NLP in Italian

    Get PDF
    In this we paper present Tint 2.0, an open-source, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP. The new release includes some improvements of the existing NLP modules, and a set of new text processing components for fine-grained linguistic analysis that were not available so far, including multi-word expression recognition, affix analysis, readability and classification of complex verb tenses.In questo articolo presentiamo Tint 2.0, una collezione di moduli opensource veloci e personalizzabili per l’analisi automatica di testi in italiano basata su Stanford CoreNLP. La nuova versione comprende alcune migliorie relative ai moduli standard, e l’integrazione di componenti totalmente nuovi per l’analisi linguistica. Questi includono per esempio il riconoscimento di espressioni polirematiche, l’analisi degli affissi, il calcolo della leggibilità e il riconoscimento dei tempi verbali composti

    VenPro: A Morphological Analyzer for Venetan

    Get PDF
    This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesize Venetan words. This task was challenging for several reasons, which are common to a number of lesser-used languages: although Venetan is widely used as an oral language in everyday life, its written usage is very limited; efforts for defining a standard orthography and grammar are very recent and not well established; despite recent attempts to propose a unified orthography, no Venetan standard is widely used. Besides, there are different geographical varieties and it is strongly influenced by Italian

    LiMoSiNe pipeline: Multilingual UIMA-based NLP platform

    Get PDF
    We present a robust and efficient parallelizable multilingual UIMA-based platform for automatically annotating textual inputs with different layers of linguistic description, ranging from surface level phenomena all the way down to deep discourse-level information. In particular, given an input text, the pipeline extracts: sentences and tokens; entity mentions; syntactic information; opinionated expressions; relations between entity mentions; co-reference chains and wikified entities. The system is available in two versions: a standalone distribution enables design and optimization of userspecific sub-modules, whereas a server-client distribution allows for straightforward highperformance NLP processing, reducing the engineering cost for higher-level tasks

    Aligning an Italian WordNet with a lexicographic dictionary: Coping with limited data

    Get PDF
    International audienceThis work describes the evaluations of two approaches, Lexical Matching and Sense Similarity, for word sense alignment between MultiWordNet and a lexicographic dictionary, Senso Comune De Mauro, when having few sense descriptions (MultiWordNet) and no structure over senses (Senso Comune De Mauro). The results obtained from the merging of the two approaches are satisfying, with F1 values of 0.47 for verbs and 0.64 for nouns

    An Editor for Assisted Translation of Italian Sign Language

    Get PDF

    In Memory of Emanuele Pianta’s Contribution to Computational Linguistics

    Get PDF
    Almost eight years after his untimely death, the scientific contribution of Emanuele Pianta still appears significant to us, in particular for the variety of the topics he dealt with and for his capacity to move cross-disciplinarily between different areas of computational linguistics. Today, retracing the steps of Emanuele’s scientific carrier has the meaning of rediscovering an important part of the scientific challenges that the Italian research community has faced over a period of more than twenty years. In recognition of the role he played, the Italian Association of Computational Linguistics entitled to Emanuele Pianta the annual award assigned to the best master’s degree thesis in the context of Computational Linguistics, discussed in an Italian University
    • …
    corecore