Search CORE

21 research outputs found

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

Author: Choi Jihun
Edmiston Daniel
Kim Taeuk
Lee Sang-goo
Publication venue
Publication date: 30/01/2020
Field of study

With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings. In line with such interest, we propose a novel method that assists us in investigating the extent to which pre-trained LMs capture the syntactic notion of constituency. Our method provides an effective way of extracting constituency trees from the pre-trained LMs without training. In addition, we report intriguing findings in the induced trees, including the fact that pre-trained LMs outperform other approaches in correctly demarcating adverb phrases in sentences.Comment: ICLR 202

arXiv.org e-Print Archive

Self-Training for Unsupervised Parsing with PRPN

Author: Bowman Samuel R.
Kann Katharina
Mohananey Anhad
Publication venue
Publication date: 27/05/2020
Field of study

Neural unsupervised parsing (UP) models learn to parse without access to syntactic annotations, while being optimized for another task like language modeling. In this work, we propose self-training for neural UP models: we leverage aggregated annotations predicted by copies of our model as supervision for future copies. To be able to use our model's predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a) such that it can be trained in a semi-supervised fashion. We then add examples with parses predicted by our model to our unlabeled UP training data. Our self-trained model outperforms the PRPN by 8.1% F1 and the previous state of the art by 1.6% F1. In addition, we show that our architecture can also be helpful for semi-supervised parsing in ultra-low-resource settings.Comment: Accepted for publication at the 16th International Conference on Parsing Technologies (IWPT), 202

arXiv.org e-Print Archive

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

Author: Kim Taeuk
Lee Sang-goo
Shin Jaeyoul
Publication venue
Publication date: 24/07/2020
Field of study

We propose a novel method that enables us to determine words that deserve to be emphasized from written text in visual media, relying only on the information from the self-attention distributions of pre-trained language models (PLMs). With extensive experiments and analyses, we show that 1) our zero-shot approach is superior to a reasonable baseline that adopts TF-IDF and that 2) there exist several attention heads in PLMs specialized for emphasis selection, confirming that PLMs are capable of recognizing important words in sentences

arXiv.org e-Print Archive

On the Branching Bias of Syntax Extracted from Pre-trained Language Models

Author: Huang Guoping
Li Huayang
Liu Lemao
Shi Shuming
Publication venue
Publication date: 05/10/2020
Field of study

Many efforts have been devoted to extracting constituency trees from pre-trained language models, often proceeding in two stages: feature definition and parsing. However, this kind of methods may suffer from the branching bias issue, which will inflate the performances on languages with the same branch it biases to. In this work, we propose quantitatively measuring the branching bias by comparing the performance gap on a language and its reversed language, which is agnostic to both language models and extracting methods. Furthermore, we analyze the impacts of three factors on the branching bias, namely parsing algorithms, feature definitions, and language models. Experiments show that several existing works exhibit branching biases, and some implementations of these three factors can introduce the branching bias.Comment: EMNLP 2020 finding

arXiv.org e-Print Archive

Multilingual Chart-based Constituency Parse Extraction from Pre-trained Language Models

Author: Kim Taeuk
Lee Sang-goo
Li Bowen
Publication venue
Publication date: 11/04/2021
Field of study

As it has been unveiled that pre-trained language models (PLMs) are to some extent capable of recognizing syntactic concepts in natural language, much effort has been made to develop a method for extracting complete (binary) parses from PLMs without training separate parsers. We improve upon this paradigm by proposing a novel chart-based method and an effective top-K ensemble technique. Moreover, we demonstrate that we can broaden the scope of application of the approach into multilingual settings. Specifically, we show that by applying our method on multilingual PLMs, it becomes possible to induce non-trivial parses for sentences from nine languages in an integrated and language-agnostic manner, attaining performance superior or comparable to that of unsupervised PCFGs. We also verify that our approach is robust to cross-lingual transfer. Finally, we provide analyses on the inner workings of our method. For instance, we discover universal attention heads which are consistently sensitive to syntactic information irrespective of the input language.Comment: preprin

arXiv.org e-Print Archive

Visually Analyzing Contextualized Embeddings

Author: Berger Matthew
Publication venue
Publication date: 05/09/2020
Field of study

In this paper we introduce a method for visually analyzing contextualized embeddings produced by deep neural network-based language models. Our approach is inspired by linguistic probes for natural language processing, where tasks are designed to probe language models for linguistic structure, such as parts-of-speech and named entities. These approaches are largely confirmatory, however, only enabling a user to test for information known a priori. In this work, we eschew supervised probing tasks, and advocate for unsupervised probes, coupled with visual exploration techniques, to assess what is learned by language models. Specifically, we cluster contextualized embeddings produced from a large text corpus, and introduce a visualization design based on this clustering and textual structure - cluster co-occurrences, cluster spans, and cluster-word membership - to help elicit the functionality of, and relationship between, individual clusters. User feedback highlights the benefits of our design in discovering different types of linguistic structures.Comment: IEEE Vis 2020, Observable notebook demo at https://observablehq.com/@mattberger/visually-analyzing-contextualized-embedding

arXiv.org e-Print Archive

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Author: Movva Rajiv
Zhao Jason Y.
Publication venue
Publication date: 16/09/2020
Field of study

Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing sparse Transformers, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases.Comment: 8 pages, 6 figures, 11 supplementary figure

arXiv.org e-Print Archive

Syntax Representation in Word Embeddings and Neural Networks -- A Survey

Author: Limisiewicz Tomasz
Mareček David
Publication venue
Publication date: 02/10/2020
Field of study

Neural networks trained on natural language processing tasks capture syntax even though it is not provided as a supervision signal. This indicates that syntactic analysis is essential to the understating of language in artificial intelligence systems. This overview paper covers approaches of evaluating the amount of syntactic information included in the representations of words for different neural network architectures. We mainly summarize re-search on English monolingual data on language modeling tasks and multilingual data for neural machine translation systems and multilingual language models. We describe which pre-trained models and representations of language are best suited for transfer to syntactic tasks

arXiv.org e-Print Archive

Analyzing Individual Neurons in Pre-trained Language Models

Author: Belinkov Yonatan
Dalvi Fahim
Durrani Nadir
Sajjad Hassan
Publication venue
Publication date: 06/10/2020
Field of study

While a lot of analysis has been carried to demonstrate linguistic knowledge captured by the representations learned within deep NLP models, very little attention has been paid towards individual neurons.We carry outa neuron-level analysis using core linguistic tasks of predicting morphology, syntax and semantics, on pre-trained language models, with questions like: i) do individual neurons in pre-trained models capture linguistic information? ii) which parts of the network learn more about certain linguistic phenomena? iii) how distributed or focused is the information? and iv) how do various architectures differ in learning these properties? We found small subsets of neurons to predict linguistic tasks, with lower level tasks (such as morphology) localized in fewer neurons, compared to higher level task of predicting syntax. Our study also reveals interesting cross architectural comparisons. For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.Comment: Accepted in EMNLP 202

arXiv.org e-Print Archive

A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages

Author: Edmiston Daniel
Publication venue
Publication date: 06/04/2020
Field of study

This work describes experiments which probe the hidden representations of several BERT-style models for morphological content. The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages. The experiments contained herein show that (i) Transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, (ii) the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases, and (iii) very specific attention head/layer combinations appear to hone in on subject-verb agreement

arXiv.org e-Print Archive