1,481 research outputs found
Attention Can Reflect Syntactic Structure (If You Let It)
Since the popularization of the Transformer as a general-purpose feature
encoder for NLP, many studies have attempted to decode linguistic structure
from its novel multi-head attention mechanism. However, much of such work
focused almost exclusively on English -- a language with rigid word order and a
lack of inflectional morphology. In this study, we present decoding experiments
for multilingual BERT across 18 languages in order to test the generalizability
of the claim that dependency syntax is reflected in attention patterns. We show
that full trees can be decoded above baseline accuracy from single attention
heads, and that individual relations are often tracked by the same heads across
languages. Furthermore, in an attempt to address recent debates about the
status of attention as an explanatory mechanism, we experiment with fine-tuning
mBERT on a supervised parsing objective while freezing different series of
parameters. Interestingly, in steering the objective to learn explicit
linguistic structure, we find much of the same structure represented in the
resulting attention patterns, with interesting differences with respect to
which parameters are frozen
Construction Grammar and Language Models
Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields
Construction Grammar and Language Models
Recent progress in deep learning and natural language processing has given
rise to powerful models that are primarily trained on a cloze-like task and
show some evidence of having access to substantial linguistic information,
including some constructional knowledge. This groundbreaking discovery presents
an exciting opportunity for a synergistic relationship between computational
methods and Construction Grammar research. In this chapter, we explore three
distinct approaches to the interplay between computational methods and
Construction Grammar: (i) computational methods for text analysis, (ii)
computational Construction Grammar, and (iii) deep learning models, with a
particular focus on language models. We touch upon the first two approaches as
a contextual foundation for the use of computational methods before providing
an accessible, yet comprehensive overview of deep learning models, which also
addresses reservations construction grammarians may have. Additionally, we
delve into experiments that explore the emergence of constructionally relevant
information within these models while also examining the aspects of
Construction Grammar that may pose challenges for these models. This chapter
aims to foster collaboration between researchers in the fields of natural
language processing and Construction Grammar. By doing so, we hope to pave the
way for new insights and advancements in both these fields.Comment: Accepted for publication in The Cambridge Handbook of Construction
Grammar, edited by Mirjam Fried and Kiki Nikiforidou. To appear in 202
Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure
We introduce SPUD (Semantically Perturbed Universal Dependencies), a
framework for creating nonce treebanks for the multilingual Universal
Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure,
provides syntactic annotations, and ensures grammaticality via
language-specific rules. We create nonce data in Arabic, English, French,
German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we
investigate the effect of nonce data on word co-occurrence statistics, as
measured by perplexity scores of autoregressive (ALM) and masked language
models (MLM). We find that ALM scores are significantly more affected by nonce
data than MLM scores. Second, we show how nonce data affects the performance of
syntactic dependency probes. We replicate the findings of M\"uller-Eberstein et
al. (2022) on nonce test data and show that the performance declines on both
MLMs and ALMs wrt. original test data. However, a majority of the performance
is kept, suggesting that the probe indeed learns syntax independently from
semantics.Comment: Our software is available at https://github.com/davidarps/spu
Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean
In this paper, we first open on important issues regarding the Penn Korean
Universal Treebank (PKT-UD) and address these issues by revising the entire
corpus manually with the aim of producing cleaner UD annotations that are more
faithful to Korean grammar. For compatibility to the rest of UD corpora, we
follow the UDv2 guidelines, and extensively revise the part-of-speech tags and
the dependency relations to reflect morphological features and flexible
word-order aspects in Korean. The original and the revised versions of PKT-UD
are experimented with transformer-based parsing models using biaffine
attention. The parsing model trained on the revised corpus shows a significant
improvement of 3.0% in labeled attachment score over the model trained on the
previous corpus. Our error analysis demonstrates that this revision allows the
parsing model to learn relations more robustly, reducing several critical
errors that used to be made by the previous model.Comment: Accepted by The 16th International Conference on Parsing
Technologies, IWPT 202
A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019)
demonstrate remarkable performance on a range of downstream tasks. A recent
line of research in probing investigates the linguistic knowledge implicitly
learned by these models during pre-training. While most work in probing
operates on the task level, linguistic tasks are rarely uniform and can be
represented in a variety of formalisms. Any linguistics-based probing study
thereby inevitably commits to the formalism used to annotate the underlying
data. Can the choice of formalism affect probing results? To investigate, we
conduct an in-depth cross-formalism layer probing study in role semantics. We
find linguistically meaningful differences in the encoding of semantic role-
and proto-role information by BERT depending on the formalism and demonstrate
that layer probing can detect subtle differences between the implementations of
the same linguistic formalism. Our results suggest that linguistic formalism is
an important dimension in probing studies, along with the commonly used
cross-task and cross-lingual experimental settings
- …