Search CORE

71 research outputs found

Predicting movie-elicited emotions from dialogue in screenplay text: A study on “Forrest gump”

Author: Dell'Orletta Felice
Iavarone Benedetta
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

We present a new dataset of sentences1 extracted from the movie Forrest Gump, annotated with the emotions perceived by a group of subjects while watching the movie. We run experiments to predict these emotions using two classifiers, one based on a Support Vector Machine with linguistic and lexical features, the other based on BERT. The experiments showed that contextual embeddings are effective in predicting human-perceived emotions

Archivio istituzionale della Ricerca - Scuola Normale Superiore

How About Time? Probing a Multilingual Language Model for Temporal Relations

Author: Caselli Tommaso
Dell'Orletta Felice
Dini Irene
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 01/01/2022
Field of study

Proceedings - University of Groningen

How Do BERT embeddings organize linguistic knowledge?

Author: Dell'Orletta Felice
Miaschi Alessio
Puccetti Giovanni
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2021
Field of study

Several studies investigated the linguistic information implicitly encoded in Neural Language Models. Most of these works focused on quantifying the amount and type of information available within their internal representations and across their layers. In line with this scenario, we proposed a different study, based on Lasso regression, aimed at understanding how the information encoded by BERT sentence-level representations is arranged within its hidden units. Using a suite of several probing tasks, we showed the existence of a relationship between the implicit knowledge learned by the model and the number of individual units involved in the encodings of this competence. Moreover, we found that it is possible to identify groups of hidden units more relevant for specific linguistic properties. © 2021 Association for Computational Linguistics

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Open Access Repository

How About Time? Probing a Multilingual Language Model for Temporal Relations

Author: Caselli Tommaso
Dell'Orletta Felice
Dini Irene
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 01/01/2022
Field of study

ARTS repository - University of Groningen

How About Time? Probing a Multilingual Language Model for Temporal Relations

Author: Caselli Tommaso
Dell'Orletta Felice
Dini Irene
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 01/01/2022
Field of study

This paper presents a comprehensive set of probing experiments using a multilingual language model, XLM-R, for temporal relation classification between events in four languages. Results show an advantage of contextualized embeddings over static ones and a detrimen- tal role of sentence level embeddings. While obtaining competitive results against state-of-the-art systems, our probes indicate a lack of suitable encoded information to properly address this task.pdf bib abs<br/

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

How About Time? Probing a Multilingual Language Model for Temporal Relations

Author: Caselli Tommaso
Dell'Orletta Felice
Dini Irene
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 01/01/2022
Field of study

Dissertations of the University of Groningen

Sentence Complexity in Context

Author: Brunato Dominique
Dell'orletta Felice
Iavarone Benedetta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

We study the influence of context on how humans evaluate the complexity of a sentence in English. We collect a new dataset of sentences, where each sentence is rated for perceived complexity within different contextual windows. We carry out an in-depth analysis to detect which linguistic features correlate more with complexity judgments and with the degree of agreement among annotators. We train several regression models, using either explicit linguistic features or contextualized word embeddings, to predict the mean complexity values assigned to sentences in the different contextual windows, as well as their standard deviation. Results show that models leveraging explicit features capturing morphosyntactic and syntactic phenomena perform always better, especially when they have access to features extracted from all contextual sentences

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation.

Author: Alessio
Dell'Orletta
Felice
Miaschi
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we present a comparison between the linguistic knowledge encoded in the internal representations of a contextual Language Model (BERT) and a contextual-independent one (Word2vec). We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model. We also find that BERT is able to encode sentence-level properties even within single-word embeddings, obtaining comparable or even superior results than those obtained with sentence representations

Open Access Repository

Stacked Sentence-Document Classifier Approach for Improving Native Language Identification

Author: Andrea Cimino
Felice Dell'Orletta
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we describe the approach of the ItaliaNLP Lab team to native language identification and discuss the results we submitted as participants to the essay track of NLI Shared Task 2017. We introduce for the first time a 2-stacked sentence-document architecture for native language identification that is able to exploit both local sentence information and a wide set of general-purpose features qualifying the lexical and grammatical structure of the whole document. When evaluated on the official test set, our sentence-document stacked architecture obtained the best result among all the participants of the essay track with an F1 score of 0.8818

Crossref

Open Access Repository