Search CORE

48 research outputs found

Keep it Neutral: Using Natural Language Inference to Improve Generation

Author: Mahowald Kyle
Mersinias Michail
Publication venue
Publication date: 27/10/2023
Field of study

We explore incorporating natural language inference (NLI) into the text generative pipeline by using a pre-trained NLI model to assess whether a generated sentence entails, contradicts, or is neutral to the prompt and preceding text. First, we show that the NLI task is predictive of generation errors made by GPT-3. We use these results to develop an NLI-informed generation procedure for GPT-J. Then, we evaluate these generations by obtaining human annotations on error types and overall quality. We find that an NLI strategy of maximizing entailment improves text generation when the nucleus sampling randomness parameter value is high, while one which maximizes contradiction is in fact productive when the parameter value is low. Overall, though, we demonstrate that an NLI strategy of maximizing the neutral class provides the highest quality of generated text (significantly better than the vanilla generations), regardless of parameter value

arXiv.org e-Print Archive

Recommended from our members

When Classifying Arguments, BERT Doesn\u27t Care About Word Order... Except When It Matters

Author: Futrell Richard
Mahowald Kyle
Papadimitriou Isabel
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/02/2022
Field of study

We probe nouns in BERT contextual embedding space for grammatical role (subject vs. object of a clause), and examine how probing results vary between prototypical examples, where the role matches what we would expect from seeing that word in the context, and non-prototypical examples, where the role is mostly imparted by the context. In this way, engage with the contrast that has arisen in the literature, between studies that show contextual models as grammatically sensitive, and others that show that these models are robust to changes in word order. Our experiments yield three results: 1) Grammatical role is recovered in later layers for difficult non-prototypical cases, while prototypical cases are accurate without many layers of context 2) When we switch the subject and the object of a sentence around (eg, The chef cut the onion, The onion cut the chef), we see that the same word (eg, onion) can be fluently identified as both a subject and an object 3) Subjecthood probing breaks if we ablate local word order by shuffle words locally and break grammaticality

ScholarWorks@UMass Amherst

Concord begets concord: A Bayesian model of nominal concord typology

Author: Jurafsky Dan
Mahowald Kyle
Norris Mark
Publication venue: 'Linguistic Society of America'
Publication date: 20/03/2021
Field of study

Nominal concord is a phenomenon whereby nominal modifiers (e.g., adjectives, demonstratives, numerals) agree with their nominals along various dimensions (e.g., gender, number, case, definiteness). Here, drawing on a rich and typologically diverse database of nominal concord (Norris 2020), we build a Bayesian mixed effect model of nominal concord. Specifically, we consider two competing hypotheses regarding the statistical relationship between different types of concord within a language: (1) concord begets concord: the presence of some type of concord in a language makes it more likely that it has other types of concord vs. (2) a little concord goes a long way: if a language has some kind of concord, it is less likely to have other types of concord. We present evidence strongly in favor of the first hypothesis, that concord begets concord. Languages with nominal concord tend to have concord in more than one place and of more than one type. Using posterior draws from our model, we also provide quantitative evidence for a number of the tendencies described by Norris (2019a). Future work will build on this model to understand the functional role of nominal concord in language systems, how it evolves, and how it co-evolves with other typological features

Proceedings Published by the LSA (Linguistic Society of America)

A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Author: Chronis Gabriella
Erk Katrin
Mahowald Kyle
Publication venue
Publication date: 29/05/2023
Field of study

We study semantic construal in grammatical constructions using large language models. First, we project contextual word embeddings into three interpretable semantic spaces, each defined by a different set of psycholinguistic feature norms. We validate these interpretable spaces and then use them to automatically derive semantic characterizations of lexical items in two grammatical constructions: nouns in subject or object position within the same sentence, and the AANN construction (e.g., `a beautiful three days'). We show that a word in subject position is interpreted as more agentive than the very same word in object position, and that the nouns in the AANN construction are interpreted as more measurement-like than when in the canonical alternation. Our method can probe the distributional meaning of syntactic constructions at a templatic level, abstracted away from specific lexemes

arXiv.org e-Print Archive

Counterfactually Probing Language Identity in Multilingual Models

Author: Govindarajan Venkata S
Mahowald Kyle
Srinivasan Anirudh
Publication venue
Publication date: 28/10/2023
Field of study

Techniques in causal analysis of language models illuminate how linguistic information is organized in LLMs. We use one such technique, AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models (mBERT and XLM-R). We train a linear classifier on a binary language identity task, to classify tokens between Language X and Language Y. Applying a counterfactual probing procedure, we use the classifier weights to project the embeddings into the null space and push the resulting embeddings either in the direction of Language X or Language Y. Then we evaluate on a masked language modeling task. We find that, given a template in Language X, pushing towards Language Y systematically increases the probability of Language Y words, above and beyond a third-party control language. But it does not specifically push the model towards translation-equivalent words in Language Y. Pushing towards Language X (the same direction as the template) has a minimal effect, but somewhat degrades these models. Overall, we take these results as further evidence of the rich structure of massive multilingual language models, which include both a language-specific and language-general component. And we show that counterfactual probing can be fruitfully applied to multilingual models.Comment: 12 pages, 5 figures, MRL Workshop @ EMNLP 202

arXiv.org e-Print Archive

SNAP judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments

Author: Gibson Edward
Graff Peter
Hartman Jeremy
Mahowald Kyle Adam
Publication venue: 'Project Muse'
Publication date: 01/10/2015
Field of study

While published linguistic judgments sometimes differ from the judgments found in large-scale formal experiments with naive participants, there is not a consensus as to how often these errors occur nor as to how often formal experiments should be used in syntax and semantics research. In this article, we first present the results of a large-scale replication of the Sprouse et al. 2013 study on 100 English contrasts randomly sampled from Linguistic Inquiry 2001–2010 and tested in both a forced-choice experiment and an acceptability rating experiment. Like Sprouse, Schütze, and Almeida, we find that the effect sizes of published linguistic acceptability judgments are not uniformly large or consistent but rather form a continuum from very large effects to small or nonexistent effects. We then use this data as a prior in a Bayesian framework to propose a small n acceptability paradigm for linguistic acceptability judgments (SNAP Judgments). This proposal makes it easier and cheaper to obtain meaningful quantitative data in syntax and semantics research. Specifically, for a contrast of linguistic interest for which a researcher is confident that sentence A is better than sentence B, we recommend that the researcher should obtain judgments from at least five unique participants, using at least five unique sentences of each type. If all participants in the sample agree that sentence A is better than sentence B, then the researcher can be confident that the result of a full forced-choice experiment would likely be 75% or more agreement in favor of sentence A (with a mean of 93%). We test this proposal by sampling from the existing data and find that it gives reliable performance.*American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi

DSpace@MIT

Crossref

Elaborative Simplification as Implicit Questions Under Discussion

Author: Li Junyi Jessy
Mahowald Kyle
Sheffield William
Wu Yating
Publication venue
Publication date: 24/10/2023
Field of study

Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models. This view fails to account for elaborative simplification, where new information is added into the simplified text. This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework, providing a robust way to investigate what writers elaborate upon, how they elaborate, and how elaborations fit into the discourse context by viewing elaborations as explicit answers to implicit questions. We introduce ElabQUD, consisting of 1.3K elaborations accompanied with implicit QUDs, to study these phenomena. We show that explicitly modeling QUD (via question generation) not only provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse, but also substantially improves the quality of elaboration generation.Comment: Equal contribution by Yating Wu and William Sheffield. This the EMNLP 2023 Main camera-ready versio

arXiv.org e-Print Archive

Recommended from our members

Multilingual BERT, Ergativity, and Grammatical Subjecthood

Author: Chi Ethan A.
Futrell Richard
Mahowald Kyle
Papadimitriou Isabel
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment (how different languages define what counts as a subject ) is manifested across the embedding spaces of different languages. To understand if and how morphosyntactic alignment affects contextual embedding spaces, we train classifiers to recover the subjecthood of mBERT embeddings in transitive sentences (which do not contain overt information about morphosyntactic alignment) and then evaluate them zero-shot on intransitive sentences (where subjecthood classification depends on alignment), within and across languages. We find that the resulting classifier distributions reflect the morphosyntactic alignment of their training languages. Our results demonstrate that mBERT representations are influenced by high-level grammatical features that are not manifested in any one input sentence, and that this is robust across languages. Further examining the characteristics that our classifiers rely on, we find that features such as passive voice, animacy and case strongly correlate with classification decisions, suggesting that mBERT does not encode a purely syntactic subjecthood, but a continuous subjecthood as is proposed in much of the functional linguistics literature. Together, these results provide insight into how grammatical features manifest in contextual embedding spaces, at a level of abstraction not covered by previous work

ScholarWorks@UMass Amherst

Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs

Author: Lederman Harvey
Mahowald Kyle
Publication venue
Publication date
Field of study

Are LLMs cultural technologies like photocopiers or printing presses, which transmit information but cannot create new content? A challenge for this idea, which we call "bibliotechnism", is that LLMs often do generate entirely novel text. We begin by defending bibliotechnism against this challenge, showing how novel text may be meaningful only in a derivative sense, so that the content of this generated text depends in an important sense on the content of original human text. We go on to present a different, novel challenge for bibliotechnism, stemming from examples in which LLMs generate “novel reference”, using novel names to refer to novel entities. Such examples could be smoothly explained if LLMs were not cultural technologies but possessed a limited form of agency (beliefs, desires, and intentions). According to interpretationism in the philosophy of mind, a system has beliefs, desires and intentions if and only if its behavior is well-explained by the hypothesis that it has such states. In line with this view, we argue that cases of novel reference provide evidence that LLMs do in fact have beliefs, desires, and intentions, and thus have a limited form of agency

PhilPapers

Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways

Author: Bostrom Kaj
Govindarajan Venkata S
Mahowald Kyle
Rodriguez Juan Diego
Publication venue
Publication date: 26/10/2023
Field of study

We present Lil-Bevo, our submission to the BabyLM Challenge. We pretrained our masked language models with three ingredients: an initial pretraining with music data, training on shorter sequences before training on longer ones, and masking specific tokens to target some of the BLiMP subtasks. Overall, our baseline models performed above chance, but far below the performance levels of larger LLMs trained on more data. We found that training on short sequences performed better than training on longer sequences.Pretraining on music may help performance marginally, but, if so, the effect seems small. Our targeted Masked Language Modeling augmentation did not seem to improve model performance in general, but did seem to help on some of the specific BLiMP tasks that we were targeting (e.g., Negative Polarity Items). Training performant LLMs on small amounts of data is a difficult but potentially informative task. While some of our techniques showed some promise, more work is needed to explore whether they can improve performance more than the modest gains here. Our code is available at https://github.com/venkatasg/Lil-Bevo and out models at https://huggingface.co/collections/venkatasg/babylm-653591cdb66f4bf68922873aComment: Proceedings of the BabyLM Challeng

arXiv.org e-Print Archive