Search CORE

21 research outputs found

How Do BERT embeddings organize linguistic knowledge?

Author: Dell'Orletta Felice
Miaschi Alessio
Puccetti Giovanni
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2021
Field of study

Several studies investigated the linguistic information implicitly encoded in Neural Language Models. Most of these works focused on quantifying the amount and type of information available within their internal representations and across their layers. In line with this scenario, we proposed a different study, based on Lasso regression, aimed at understanding how the information encoded by BERT sentence-level representations is arranged within its hidden units. Using a suite of several probing tasks, we showed the existence of a relationship between the implicit knowledge learned by the model and the number of individual units involved in the encodings of this competence. Moreover, we found that it is possible to identify groups of hidden units more relevant for specific linguistic properties. © 2021 Association for Computational Linguistics

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Open Access Repository

Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation.

Author: Alessio
Dell'Orletta
Felice
Miaschi
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we present a comparison between the linguistic knowledge encoded in the internal representations of a contextual Language Model (BERT) and a contextual-independent one (Word2vec). We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model. We also find that BERT is able to encode sentence-level properties even within single-word embeddings, obtaining comparable or even superior results than those obtained with sentence representations

Open Access Repository

What Makes My Model Perplexed? A Linguistic Investigation on Neural Language Models Perplexity

Author: Alessio Miaschi
Dominique Brunato
Felice Dell'Orletta
Giulia Venturi
Publication venue
Publication date: 01/01/2021
Field of study

This paper presents an investigation aimed at studying how the linguistic structure of a sentence affects the perplexity of two of the most popular Neural Language Models (NLMs), BERT and GPT-2. We first compare the sentence-level likelihood computed with BERT and the GPT-2's perplexity showing that the two metrics are correlated. In addition, we exploit linguistic features capturing a wide set of morpho-syntactic and syntactic phenomena showing how they contribute to predict the perplexity of the two NLMs

Open Access Repository

linguistically driven strategy for concept prerequisites learning on italian

Author: Alessio Miaschi
Chiara Alzetta
Felice Dell'Orletta
Franco Alberto Cardillo
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Open Access Repository

Tracking the Evolution of Written Language Competence in L2 Spanish Learners

Author: Alessio Miaschi
Claudia Sánchez-Gutiérrez
Dominique Brunato
Felice Dell'Orletta
Giulia Venturi
Kenji Sagae
Sam Davidson
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students' written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflects the explicit instruction that students receive during each course

Crossref

Open Access Repository

Edizione critica digitale del codice noto come Codice Pelavicino conservato presso l’Archivio Capitolare Lunense (Sarzana), manoscritto del XIII secolo, che contiene numerosi documenti dei secoli XI-XIII inerenti la Chiesa di Luni. Principale promotore della sua redazione fu il vescovo di Luni Enrico da Fucecchio, che salì al soglio episcopale nel 1273 e resignò la carica tra il 24 ottobre 1296 e l’inizio del 1297. Al fine di salvaguardare i beni e i diritti della Chiesa di Luni, il vescovo Enrico riorganizzò gli uffici della curia, fece compilare un inventario generale dell’archivio ecclesiastico e attivò uno scrittorio, nel quale lavorarono diversi amanuensi, da cui uscì appunto il codice, che contiene 529 testi diversi. L’edizione digitale attuata tramite codifica della trascrizione, visione contemporanea dell’immagine digitale del codice, accesso all’apparato critico e agli strumenti di corredo, per mezzo del sofware open source EVT

Archivio della Ricerca - Università di Pisa

Codice Pelavicino. Edizione digitale

Author: Alessio Miaschi
Chiara Alzetta
Chiara Di Pietro
Chiara Mannari
Edilio Riccardini
Laura Balletto
MASOTTI RAFFAELE
Roberto Rosselli del Turco
SALVATORI ENRICA
Publication venue: place:Pisa
Publication date: 01/01/2014
Field of study

Edizione critica digitale del Codice Pelavicino conservato presso l’Archivio Capitolare Lunense (Sarzana); si compone di 426 cc. numerate e 20 cc. non numerate; contiene diversi testi tra cui il Liber Iurium della Chiesa di Luni

Archivio della Ricerca - Università di Pisa