39 research outputs found
Italian Event Detection Goes Deep Learning
This paper reports on a set of experiments with different word embeddings to
initialize a state-of-the-art Bi-LSTM-CRF network for event detection and
classification in Italian, following the EVENTI evaluation exercise. The net-
work obtains a new state-of-the-art result by improving the F1 score for
detection of 1.3 points, and of 6.5 points for classification, by using a
single step approach. The results also provide further evidence that embeddings
have a major impact on the performance of such architectures.Comment: to appear at CLiC-it 201
Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources
In this work, we present an effective method for semantic specialization of
word vector representations. To this end, we use traditional word embeddings
and apply specialization methods to better capture semantic relations between
words. In our approach, we leverage external knowledge from rich lexical
resources such as BabelNet. We also show that our proposed post-specialization
method based on an adversarial neural network with the Wasserstein distance
allows to gain improvements over state-of-the-art methods on two tasks: word
similarity and dialog state tracking.Comment: Accepted to ACL 2020 SR
Auxiliary selection in Italian intransitive verbs: a computational investigation based on annotated corpora
The purpose of this paper is the analysis of the auxiliary selection in intransitive verbs in Italian. The applied methodology consists in comparing the linguistic theory with the data extracted from two different annotated corpora: UD-IT and PoSTWITA-UD. The analyzed verbs have been classified in different semantic categories depending on the linguistic theory. The results confirm the theoretical assumptions and they could be considered as a starting point for many applicative tasks as Natural Language Generation.Obiettivo di questo lavoro è l’analisi della selezione dell’ausiliare dei verbi intransitivi in italiano. La metodologia applicata consiste nel confrontare la teoria linguistica con dati estratti da due corpora annotati: UD-IT e PoSTWITAUD. I verbi analizzati sono stati classificati nelle categorie semantiche individuate partendo dalla letteratura teorica. I risultati confermano con buona approssimazione gli assunti teorici e possono quindi essere il punto di partenza per l’implementazione di strumenti come sistemi di Natural Language Generation
Long-term Social Media Data Collection at the University of Turin
We report on the collection of social media messages — from Twitter in particular — in the Italian language that is continuously going on since 2012 at the University of Turin. A number of smaller datasets have been extracted from the main collection and enriched with different kinds of annotations for linguistic purposes. Moreover, a few extra datasets have been collected independently and are now in the process of being merged with the main collection. We aim at making the resource available to the community to the best of our possibility, in accordance with the Terms of Service provided by the platforms where data have been gathered from.In questo articolo descriviamo il lavoro di raccolta di messaggi — da Twitter in particolar modo—in lingua italiana che va avanti in maniera continuativa dal 2012 presso l’Università di Torino. Diversi dataset sono stati estratti dalla raccolta principale ed arricchiti con differenti tipi di annotazione per scopi linguistici. Inoltre, dataset ulteriori sono stati raccolti indipendentemente, e fanno ora parte della raccolta principale. Il nostro scopo è rendere questa risorsa disponibile alla comunit` a in maniera pi`u completa possibile, considerati i termini d’uso imposti dalle piattaforme da cui i dati sono stati estratti