23 research outputs found
Recommended from our members
Testing for Grammatical Category Abstraction in Neural Language Models
We propose a new method inspired by human developmental studies to probe pretrained neural language models on their ability to make grammatical category (part-of-speech) abstraction and generalization to novel contexts. Our method does not require training a separate classifier, bypassing the methodological questions raised in the recent literature on the validity of using diagnostic classifiers as probes. The results of our experiment testing BERT-large suggests that it can make category-based generalizations to a degree, but this capacity is still limited in several aspects
Entity Tracking in Language Models
Keeping track of how states of entities change as a text or dialog unfolds is
a key prerequisite to discourse understanding. Yet, there have been few
systematic investigations into the ability of large language models (LLMs) to
track discourse entities. In this work, we present a task probing to what
extent a language model can infer the final state of an entity given an English
description of the initial state and a series of state-changing operations. We
use this task to first investigate whether Flan-T5, GPT-3 and GPT-3.5 can track
the state of entities, and find that only GPT-3.5 models, which have been
pretrained on large amounts of code, exhibit this ability. We then investigate
whether smaller models pretrained primarily on text can learn to track
entities, through finetuning T5 on several training/evaluation splits. While
performance degrades for more complex splits, we find that even when evaluated
on a different set of entities from training or longer operation sequences, a
finetuned model can perform non-trivial entity tracking. Taken together, these
results suggest that language models can learn to track entities but
pretraining on text corpora alone does not make this capacity surface.Comment: ACL 2023 Camera-read
Abstraction via exemplars? A representational case study on lexical category inference in BERT
Exemplar based accounts are often considered to be in direct opposition to
pure linguistic abstraction in explaining language learners' ability to
generalize to novel expressions. However, the recent success of neural network
language models on linguistically sensitive tasks suggests that perhaps
abstractions can arise via the encoding of exemplars. We provide empirical
evidence for this claim by adapting an existing experiment that studies how an
LM (BERT) generalizes the usage of novel tokens that belong to lexical
categories such as Noun/Verb/Adjective/Adverb from exposure to only a single
instance of their usage. We analyze the representational behavior of the novel
tokens in these experiments, and find that BERT's capacity to generalize to
unseen expressions involving the use of these novel tokens constitutes the
movement of novel token representations towards regions of known category
exemplars in two-dimensional space. Our results suggest that learners' encoding
of exemplars can indeed give rise to abstraction like behavior.Comment: 2-page abstract, to appear in BUCLD4
Entity tracking in language models
https://doi.org/10.18653/v1/2023.acl-long.213Published versio
Compositional Linguistic Generalization in Artificial Neural Networks
Compositionality---the principle that the meaning of a complex expression is built from the meanings of its parts---is considered a central property of human language. This dissertation focuses on compositional generalization, a key benefit of compositionality that enables the production and comprehension of novel expressions. Specifically, this dissertation develops a test for compositional generalization for sequence-to-sequence artificial neural networks (ANNs). Before doing so, I start by developing a test for grammatical category abstraction: an important precondition to compositional generalization, because category membership determines the applicability of compositional rules. Then, I construct a test for compositional generalization based on human generalization patterns discussed in existing linguistic and developmental studies. The test takes the form of semantic parsing (translation from natural language expressions to semantic representations) where the training and generalization sets have systematic gaps that can be filled by composing known parts. The generalization cases fall into two broad categories: lexical and structural, depending on whether generalization to novel combinations of known lexical items and known structures is required, or generalization to novel structures is required. The ANNs evaluated on this test exhibit limited degrees of compositional generalization, implying that the inductive biases of the ANNs and human learners differ substantially. An error analysis reveals that all ANNs tested frequently make generalizations that violate faithfulness constraints (e.g., Emma saw Lina ↝ see'(Emma', Audrey') instead of see'(Emma', Lina')). Adding a glossing task (word-by-word translation)---a task that requires maximally faithful input-output mappings---as an auxiliary objective to the Transformer model (Vaswani et al. 2017) greatly improves generalization, demonstrating that a faithfulness bias can be injected through the auxiliary training approach. However, the improvement is limited to lexical generalization; all models struggle with assigning appropriate semantic representations to novel structures regardless of auxiliary training. This difficulty of structural generalization leaves open questions for both ANN and human learners. I discuss promising directions for improving structural generalization in ANNs, and furthermore propose an artificial language learning study for human subjects analogous to the tests posed to ANNs, which will lead to more detailed characterization of the patterns of structural generalization in human learners
Inverse scaling can become U-shaped
https://doi.org/10.18653/v1/2023.emnlp-main.963Published versio
Reconstruction probing
https://doi.org/10.18653/v1/2023.findings-acl.523Published versio
(QA)2: Question answering with questionable assumptions: question answering with questionable assumptions
https://doi.org/10.18653/v1/2023.acl-long.472Published versio
SLOG: a structural generalization benchmark for semantic parsing
http://10.0.72.221/v1/2023.emnlp-main.194Published versio
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
Remarkable progress has been made on automated reasoning with knowledge
specified as unstructured, natural text, by using the power of large language
models (LMs) coupled with methods such as Chain-of-Thought prompting and
Selection-Inference. These techniques search for proofs in the forward
direction from axioms to the conclusion, which suffers from a combinatorial
explosion of the search space, and thus high failure rates for problems
requiring longer chains of reasoning. The classical automated reasoning
literature has shown that reasoning in the backward direction (i.e. from the
intended conclusion to the set of axioms that support it) is significantly more
efficient at proof-finding problems. We import this intuition into the LM
setting and develop a Backward Chaining algorithm, which we call LAMBADA, that
decomposes reasoning into four sub-modules, each of which can be simply
implemented by few-shot prompted LM inference. We show that LAMBADA achieves
massive accuracy boosts over state-of-the-art forward reasoning methods on two
challenging logical reasoning datasets, particularly when deep and accurate
proof chains are required.Comment: 16 page