473 research outputs found
Event knowledge in large language models: the gap between the impossible and the unlikely
Word co-occurrence patterns in language corpora contain a surprising amount
of conceptual knowledge. Large language models (LLMs), trained to predict words
in context, leverage these patterns to achieve impressive performance on
diverse semantic tasks requiring world knowledge. An important but understudied
question about LLMs' semantic abilities is whether they acquire generalized
knowledge of common events. Here, we test whether five pre-trained LLMs (from
2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions
of agent-patient interactions than to minimally different implausible versions
of the same event. Using three curated sets of minimal sentence pairs (total
n=1,215), we found that pre-trained LLMs possess substantial event knowledge,
outperforming other distributional language models. In particular, they almost
always assign higher likelihood to possible vs. impossible events (The teacher
bought the laptop vs. The laptop bought the teacher). However, LLMs show less
consistent preferences for likely vs. unlikely events (The nanny tutored the
boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM
scores are driven by both plausibility and surface-level sentence features,
(ii) LLM scores generalize well across syntactic variants (active vs. passive
constructions) but less well across semantic variants (synonymous sentences),
(iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence
plausibility serves as an organizing dimension in internal LLM representations.
Overall, our results show that important aspects of event knowledge naturally
emerge from distributional linguistic patterns, but also highlight a gap
between representations of possible/impossible and likely/unlikely events.Comment: The two lead authors have contributed equally to this wor
Testing word embeddings for Polish
Testing word embeddings for Polish
Distributional Semantics postulates the representation of word meaning in the form of numeric vectors which represent words which occur in context in large text data. This paper addresses the problem of constructing such models for the Polish language. The paper compares the effectiveness of models based on lemmas and forms created with Continuous Bag of Words (CBOW) and skip-gram approaches based on different Polish corpora. For the purposes of this comparison, the results of two typical tasks solved with the help of distributional semantics, i.e. synonymy and analogy recognition, are compared. The results show that it is not possible to identify one universal approach to vector creation applicable to various tasks. The most important feature is the quality and size of the data, but different strategy choices can also lead to significantly different results.
Testowanie wektorowych reprezentacji dystrybucyjnych słów języka polskiego
Semantyka dystrybucyjna opiera się na założeniu, że znaczenie słów wyrażone jest za pomocą wektorów reprezentujących, w sposób bezpośredni bądź pośredni, konteksty, w jakich słowo to jest używane w dużym zbiorze tekstów. Niniejszy artykuł dotyczy ewaluacji wielu takich modeli skonstruowanych dla języka polskiego. W pracy porównano skuteczność modeli opartych na lematach i formach słów, utworzonych przy wykorzystaniu sieci neuronowych na danych z dwóch różnych korpusów języka polskiego. Ewaluacji dokonano na podstawie wyników dwóch typowych zadań rozwiązywanych za pomocą metod semantyki dystrybucyjnej, tzn. rozpoznania występowania synonimii i analogii między konkretnymi parami słów. Uzyskane wyniki dowodzą, że nie można wskazać jednego uniwersalnego podejścia do tworzenia modeli dystrybucyjnych, gdyż ich skuteczność jest różna w zależności od zastosowania. Najważniejszą cechą wpływającą na jakość modelu jest jakość oraz rozmiar danych, ale wybory różnych strategii uczenia sieci mogą również prowadzić do istotnie odmiennych wyników
AugCSE: contrastive sentence embedding with diverse augmentations
Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize diverse sets of data augmentations to achieve a better, general-purpose, sentence embedding model. Building upon the latest sentence embedding models, our approach uses a simple antagonistic discriminator that differentiates the augmentation types. With the finetuning objective borrowed from domain adaptation, we show that diverse augmentations, which often lead to conflicting contrastive signals, can be tamed to produce a better and more robust sentence representation. Our methods achieve state-of-the-art results on downstream transfer tasks and perform competitively on semantic textual similarity tasks, using only unsupervised data.000000000000000000000000000000000000000000000000000000010241 - University of California, Berkeleyhttps://aclanthology.org/2022.aacl-main.30/First author draf
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous non-contextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships
Semantic Entropy in Language Comprehension
Language is processed on a more or less word-by-word basis, and the processing difficulty
induced by each word is affected by our prior linguistic experience as well as our general knowledge
about the world. Surprisal and entropy reduction have been independently proposed as linking
theories between word processing difficulty and probabilistic language models. Extant models, however,
are typically limited to capturing linguistic experience and hence cannot account for the influence of
world knowledge. A recent comprehension model by Venhuizen, Crocker, and Brouwer (2019, Discourse
Processes) improves upon this situation by instantiating a comprehension-centric metric of surprisal that
integrates linguistic experience and world knowledge at the level of interpretation and combines them in
determining online expectations. Here, we extend this work by deriving a comprehension-centric metric
of entropy reduction from this model. In contrast to previous work, which has found that surprisal and
entropy reduction are not easily dissociated, we do find a clear dissociation in our model. While both
surprisal and entropy reduction derive from the same cognitive process—the word-by-word updating
of the unfolding interpretation—they reflect different aspects of this process: state-by-state expectation
(surprisal) versus end-state confirmation (entropy reduction)
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
How does language inform our downstream thinking? In particular, how do
humans make meaning from language -- and how can we leverage a theory of
linguistic meaning to build machines that think in more human-like ways? In
this paper, we propose \textit{rational meaning construction}, a computational
framework for language-informed thinking that combines neural models of
language with probabilistic models for rational inference. We frame linguistic
meaning as a context-sensitive mapping from natural language into a
\textit{probabilistic language of thought} (PLoT) -- a general-purpose symbolic
substrate for probabilistic, generative world modeling. Our architecture
integrates two powerful computational tools that have not previously come
together: we model thinking with \textit{probabilistic programs}, an expressive
representation for flexible commonsense reasoning; and we model meaning
construction with \textit{large language models} (LLMs), which support
broad-coverage translation from natural language utterances to code expressions
in a probabilistic programming language. We illustrate our framework in action
through examples covering four core domains from cognitive science:
probabilistic reasoning, logical and relational reasoning, visual and physical
reasoning, and social reasoning about agents and their plans. In each, we show
that LLMs can generate context-sensitive translations that capture
pragmatically-appropriate linguistic meanings, while Bayesian inference with
the generated programs supports coherent and robust commonsense reasoning. We
extend our framework to integrate cognitively-motivated symbolic modules to
provide a unified commonsense thinking interface from language. Finally, we
explore how language can drive the construction of world models themselves
- …