23 research outputs found
Information content versus word length in random typing
Recently, it has been claimed that a linear relationship between a measure of
information content and word length is expected from word length optimization
and it has been shown that this linearity is supported by a strong correlation
between information content and word length in many languages (Piantadosi et
al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections
between this measure and standard information theory. The relationship between
the measure and word length is studied for the popular random typing process
where a text is constructed by pressing keys at random from a keyboard
containing letters and a space behaving as a word delimiter. Although this
random process does not optimize word lengths according to information content,
it exhibits a linear relationship between information content and word length.
The exact slope and intercept are presented for three major variants of the
random typing process. A strong correlation between information content and
word length can simply arise from the units making a word (e.g., letters) and
not necessarily from the interplay between a word and its context as proposed
by Piantadosi et al. In itself, the linear relation does not entail the results
of any optimization process
Constant conditional entropy and related hypotheses
Constant entropy rate (conditional entropies must remain constant as the sequence length increases) and uniform information density (conditional probabilities must remain constant as the sequence length increases) are two information theoretic principles that are argued to underlie a wide range of linguistic phenomena. Here we revise the predictions of these principles in the light of Hilberg's law on the scaling of conditional entropy in language and related laws. We show that constant entropy rate (CER) and two interpretations for uniform information density (UID), full UID and strong UID, are inconsistent with these laws. Strong UID implies CER but the reverse is not true. Full UID, a particular case of UID, leads to costly uncorrelated sequences that are totally unrealistic. We conclude that CER and its particular cases are incomplete hypotheses about the scaling of conditional entropies.Peer ReviewedPostprint (author's final draft
Must analysis of meaning follow analysis of form? A time course analysis
Many models of word recognition assume that processing proceeds sequentially from analysis of form to analysis of meaning. In the context of morphological processing, this implies that morphemes are processed as units of form prior to any influence of their meanings. Some interpret the apparent absence of differences in recognition latencies to targets (SNEAK) in form and semantically similar (sneaky-SNEAK) and in form similar and semantically dissimilar (sneaker-SNEAK) prime contexts at a stimulus onset asynchrony (SOA) of 48 ms as consistent with this claim. To determine the time course over which degree of semantic similarity between morphologically structured primes and their targets influences recognition in the forward masked priming variant of the lexical decision paradigm, we compared facilitation for the same targets after semantically similar and dissimilar primes across a range of SOAs (34–100 ms). The effect of shared semantics on recognition latency increased linearly with SOA when long SOAs were intermixed (Experiments 1A and 1B) and latencies were significantly faster after semantically similar than dissimilar primes at homogeneous SOAs of 48 ms (Experiment 2) and 34 ms (Experiment 3). Results limit the scope of form-then-semantics models of recognition and demonstrate that semantics influences even the very early stages of recognition. Finally, once general performance across trials has been accounted for, we fail to provide evidence for individual differences in morphological processing that can be linked to measures of reading proficiency