314 research outputs found
Revisiting the Uniform Information Density Hypothesis
The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal. While its implications on language production have been well explored, the hypothesis potentially makes predictions about language comprehension and linguistic acceptability as well. Further, it is unclear how uniformity in a linguistic signal -- or lack thereof -- should be measured, and over which linguistic unit, e.g., the sentence or language level, this uniformity should hold. Here we investigate these facets of the UID hypothesis using reading time and acceptability data. While our reading time results are generally consistent with previous work, they are also consistent with a weakly super-linear effect of surprisal, which would be compatible with UID's predictions. For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability. We then explore multiple operationalizations of UID, motivated by different interpretations of the original hypothesis, and analyze the scope over which the pressure towards uniformity is exerted. The explanatory power of a subset of the proposed operationalizations suggests that the strongest trend may be a regression towards a mean surprisal across the language, rather than the phrase, sentence, or document -- a finding that supports a typical interpretation of UID, namely that it is the byproduct of language users maximizing the use of a (hypothetical) communication channel
Testing the Predictions of Surprisal Theory in 11 Languages
A fundamental result in psycholinguistics is that less predictable words take
a longer time to process. One theoretical explanation for this finding is
Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's
predictability as its surprisal, i.e. its negative log-probability given a
context. While evidence supporting the predictions of Surprisal Theory have
been replicated widely, most have focused on a very narrow slice of data:
native English speakers reading English texts. Indeed, no comprehensive
multilingual analysis exists. We address this gap in the current literature by
investigating the relationship between surprisal and reading times in eleven
different languages, distributed across five language families. Deriving
estimates from language models trained on monolingual and multilingual corpora,
we test three predictions associated with surprisal theory: (i) whether
surprisal is predictive of reading times; (ii) whether expected surprisal, i.e.
contextual entropy, is predictive of reading times; (iii) and whether the
linking function between surprisal and reading times is linear. We find that
all three predictions are borne out crosslinguistically. By focusing on a more
diverse set of languages, we argue that these results offer the most robust
link to-date between information theory and incremental language processing
across languages.Comment: This is a pre-MIT Press publication version of the pape
Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens
Recent psycholinguistic studies have drawn conflicting conclusions about the
relationship between the quality of a language model and the ability of its
surprisal estimates to predict human reading times, which has been speculated
to be due to the large gap in both the amount of training data and model
capacity across studies. The current work aims to consolidate these findings by
evaluating surprisal estimates from Transformer-based language model variants
that vary systematically in the amount of training data and model capacity on
their ability to predict human reading times. The results show that surprisal
estimates from most variants with contemporary model capacities provide the
best fit after seeing about two billion training tokens, after which they begin
to diverge from humanlike expectations. Additionally, newly-trained smaller
model variants reveal a 'tipping point' at convergence, after which the
decrease in language model perplexity begins to result in poorer fits to human
reading times. These results suggest that the massive amount of training data
is mainly responsible for the poorer fit achieved by surprisal from larger
pre-trained language models, and that a certain degree of model capacity is
necessary for Transformer-based language models to capture humanlike
expectations.Comment: Findings of the Association for Computational Linguistics: EMNLP 202
Comparing Transformers and RNNs on predicting human sentence processing data
Recurrent neural networks (RNNs) have long been an architecture of interest
for computational models of human sentence processing. The more recently
introduced Transformer architecture has been shown to outperform recurrent
neural networks on many natural language processing tasks but little is known
about their ability to model human language processing. It has long been
thought that human sentence reading involves something akin to recurrence and
so RNNs may still have an advantage over the Transformer as a cognitive model.
In this paper we train both Transformer and RNN based language models and
compare their performance as a model of human sentence processing. We use the
trained language models to compute surprisal values for the stimuli used in
several reading experiments and use mixed linear modelling to measure how well
the surprisal explains measures of human reading effort. Our analysis shows
that the Transformers outperform the RNNs as cognitive models in explaining
self-paced reading times and N400 strength but not gaze durations from an
eye-tracking experiment
On the Effect of Anticipation on Reading Times
Over the past two decades, numerous studies have demonstrated how less
predictable (i.e., higher surprisal) words take more time to read. In general,
these studies have implicitly assumed the reading process is purely responsive:
Readers observe a new word and allocate time to process it as required. We
argue that prior results are also compatible with a reading process that is at
least partially anticipatory: Readers could make predictions about a future
word and allocate time to process it based on their expectation. In this work,
we operationalize this anticipation as a word's contextual entropy. We assess
the effect of anticipation on reading by comparing how well surprisal and
contextual entropy predict reading times on four naturalistic reading datasets:
two self-paced and two eye-tracking. Experimentally, across datasets and
analyses, we find substantial evidence for effects of contextual entropy over
surprisal on a word's reading time (RT): in fact, entropy is sometimes better
than surprisal in predicting a word's RT. Spillover effects, however, are
generally not captured by entropy, but only by surprisal. Further, we
hypothesize four cognitive mechanisms through which contextual entropy could
impact RTs -- three of which we are able to design experiments to analyze.
Overall, our results support a view of reading that is not just responsive, but
also anticipatory.Comment: This is a pre-MIT Press publication version of the paper. Code is
available in https://github.com/rycolab/anticipation-on-reading-time
Neural models of language use:Studies of language comprehension and production in context
Artificial neural network models of language are mostly known and appreciated today for providing a backbone for formidable AI technologies. This thesis takes a different perspective. Through a series of studies on language comprehension and production, it investigates whether artificial neural networks—beyond being useful in countless AI applications—can serve as accurate computational simulations of human language use, and thus as a new core methodology for the language sciences
Is the mind inherently predicting? Exploring forward and backward looking in language processing
Prediction is one characteristic of the human mind. But what does it mean to say the mind is a ’prediction machine’ and inherently forward looking as is frequently claimed? In natural languages, many contexts are not easily predictable in a forward fashion. In English for example many frequent verbs do not carry unique meaning on their own, but instead rely on another word or words that follow them to become meaningful. Upon reading take a the processor often cannot easily predict walk as the next word. But the system can ‘look back’ and integrate walk more easily when it follows take a (e.g., as opposed to make|get|have a walk). In the present paper we provide further evidence for the importance of both forward and backward looking in language processing. In two self-paced reading tasks and an eye-tracking reading task, we found evidence that adult English native speakers’ sensitivity to word forward and backward conditional probability significantly explained variance in reading times over and above psycholinguistic predictors of reading latencies. We conclude that both forward and backward-looking (prediction and integration) appear to be important characteristics of language processing. Our results thus suggest that it makes just as much sense to call the mind an ’integration machine’ which is inherently backward looking
Lower Perplexity is Not Always Human-Like
In computational psycholinguistics, various language models have been
evaluated against human reading behavior (e.g., eye movement) to build
human-like computational models. However, most previous efforts have focused
almost exclusively on English, despite the recent trend towards linguistic
universal within the general community. In order to fill the gap, this paper
investigates whether the established results in computational psycholinguistics
can be generalized across languages. Specifically, we re-examine an established
generalization -- the lower perplexity a language model has, the more
human-like the language model is -- in Japanese with typologically different
structures from English. Our experiments demonstrate that this established
generalization exhibits a surprising lack of universality; namely, lower
perplexity is not always human-like. Moreover, this discrepancy between English
and Japanese is further explored from the perspective of (non-)uniform
information density. Overall, our results suggest that a cross-lingual
evaluation will be necessary to construct human-like computational models.Comment: Accepted by ACL 202
- …