30 research outputs found

    A distributional semantic study on German event nominalizations

    Get PDF
    AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, 'the evaluation') and nominal infinitives (e.g., das Evaluieren, 'the evaluating'). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline

    A Diachronic Italian Corpus based on “L’Unità”

    Get PDF
    In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series

    DIACR-Ita @ EVALITA2020:Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task

    Get PDF
    This paper describes the first edition of the “Diachronic Lexical Seman-tics” (DIACR-Ita) task at the EVALITA2020 campaign. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given con-textual information from corpora.The task, at its first edition, attracted 9 participant teams and collected a total of 36 sub-mission run
    corecore