21 research outputs found

    Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian

    Get PDF
    The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications

    Multiword Expressions We Live by:A Validated Usage-based Dataset from Corpora of Written Italian

    Get PDF
    none5siThe paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.openFrancesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina NissimFrancesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissi

    Multiword Expressions We Live by:A Validated Usage-based Dataset from Corpora of Written Italian

    Get PDF
    none5siThe paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.openFrancesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina NissimFrancesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissi

    Creation of lexical resources for a characterisation of multiword expressions in Italian

    No full text
    The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions, revisiting lexicographers’s choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions. 1

    Tracing metaphors in time through self-distance in vector spaces

    No full text
    Abstract English. From a diachronic corpus of Italian, we build consecutive vector spaces in time and use them to compare a term's cosine similarity to itself in different time spans. We assume that a drop in similarity might be related to the emergence of a metaphorical sense at a given time. Similarity-based observations are matched to the actual year when a figurative meaning was documented in a reference dictionary and through manual inspection of corpus occurrences. Italiano. Nel presente esperimento costruiamo spazi vettoriali progressivi nel tempo su un corpus diacronico dell'italiano e calcoliamo la distanza di alcuni termini rispetto a loro stessi in differenti periodi. L'ipotesiè che un calo di similitudine possa essere indicativo dell'acquisizione di un significato metaforico. Tale ipotesiè valutata attraverso una risorsa lessicografica esterna e l'annotazione manuale dei contesti dei termini nel corpus

    Tracing metaphors in time through self-distance in vector spaces

    No full text
    From a diachronic corpus of Italian, we build consecutive vector spaces in time and use them to compare a term’s cosine similarity to itself in different time spans. We assume that a drop in similarity might be related to the emergence of a metaphorical sense at a given time. Similarity-based observations are matched to the actual year when a figurative meaning was documented in a reference dictionary and through manual inspection of corpus occurrences.Nel presente esperimento costruiamo spazi vettoriali progressivi nel tempo su un corpus diacronico dell’italiano e calcoliamo la distanza di alcuni termini rispetto a loro stessi in differenti periodi. L’ipotesi è che un calo di similitudine possa essere indicativo dell’acquisizione di un significato metaforico. Tale ipotesi è valutata attraverso una risorsa lessicografica esterna e l’annotazione manuale dei contesti dei termini nel corpus