21 research outputs found

    How Do BERT embeddings organize linguistic knowledge?

    Get PDF
    Several studies investigated the linguistic information implicitly encoded in Neural Language Models. Most of these works focused on quantifying the amount and type of information available within their internal representations and across their layers. In line with this scenario, we proposed a different study, based on Lasso regression, aimed at understanding how the information encoded by BERT sentence-level representations is arranged within its hidden units. Using a suite of several probing tasks, we showed the existence of a relationship between the implicit knowledge learned by the model and the number of individual units involved in the encodings of this competence. Moreover, we found that it is possible to identify groups of hidden units more relevant for specific linguistic properties. © 2021 Association for Computational Linguistics

    Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation.

    Get PDF
    In this paper we present a comparison between the linguistic knowledge encoded in the internal representations of a contextual Language Model (BERT) and a contextual-independent one (Word2vec). We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model. We also find that BERT is able to encode sentence-level properties even within single-word embeddings, obtaining comparable or even superior results than those obtained with sentence representations

    What Makes My Model Perplexed? A Linguistic Investigation on Neural Language Models Perplexity

    Get PDF
    This paper presents an investigation aimed at studying how the linguistic structure of a sentence affects the perplexity of two of the most popular Neural Language Models (NLMs), BERT and GPT-2. We first compare the sentence-level likelihood computed with BERT and the GPT-2's perplexity showing that the two metrics are correlated. In addition, we exploit linguistic features capturing a wide set of morpho-syntactic and syntactic phenomena showing how they contribute to predict the perplexity of the two NLMs

    Tracking the Evolution of Written Language Competence in L2 Spanish Learners

    Get PDF
    In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students' written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflects the explicit instruction that students receive during each course

    Codice Pelavicino. Edizione digitale

    Get PDF
    Edizione critica digitale del codice noto come Codice Pelavicino conservato presso l’Archivio Capitolare Lunense (Sarzana), manoscritto del XIII secolo, che contiene numerosi documenti dei secoli XI-XIII inerenti la Chiesa di Luni. Principale promotore della sua redazione fu il vescovo di Luni Enrico da Fucecchio, che salì al soglio episcopale nel 1273 e resignò la carica tra il 24 ottobre 1296 e l’inizio del 1297. Al fine di salvaguardare i beni e i diritti della Chiesa di Luni, il vescovo Enrico riorganizzò gli uffici della curia, fece compilare un inventario generale dell’archivio ecclesiastico e attivò uno scrittorio, nel quale lavorarono diversi amanuensi, da cui uscì appunto il codice, che contiene 529 testi diversi. L’edizione digitale attuata tramite codifica della trascrizione, visione contemporanea dell’immagine digitale del codice, accesso all’apparato critico e agli strumenti di corredo, per mezzo del sofware open source EVT

    Codice Pelavicino. Edizione digitale

    Get PDF
    Edizione critica digitale del Codice Pelavicino conservato presso l’Archivio Capitolare Lunense (Sarzana); si compone di 426 cc. numerate e 20 cc. non numerate; contiene diversi testi tra cui il Liber Iurium della Chiesa di Luni
    corecore