Search CORE

23 research outputs found

Information content versus word length in random typing

Author: Ferrer-i-Cancho Ramon
Martín Fermín Moscoso del Prado
Publication venue: 'IOP Publishing'
Publication date: 01/01/2011
Field of study

Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi et al. In itself, the linear relation does not entail the results of any optimization process

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

HAL AMU

Constant conditional entropy and related hypotheses

Author: Debowski Lukasz
Ferrer Cancho Ramon
Moscoso del Prado Martín Fermín
Publication venue: 'IOP Publishing'
Publication date: 01/01/2013
Field of study

Constant entropy rate (conditional entropies must remain constant as the sequence length increases) and uniform information density (conditional probabilities must remain constant as the sequence length increases) are two information theoretic principles that are argued to underlie a wide range of linguistic phenomena. Here we revise the predictions of these principles in the light of Hilberg's law on the scaling of conditional entropy in language and related laws. We show that constant entropy rate (CER) and two interpretations for uniform information density (UID), full UID and strong UID, are inconsistent with these laws. Strong UID implies CER but the reverse is not true. Full UID, a particular case of UID, leads to costly uncorrelated sequences that are totally unrealistic. We conclude that CER and its particular cases are incomplete hypotheses about the scaling of conditional entropies.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Must analysis of meaning follow analysis of form? A time course analysis

Many models of word recognition assume that processing proceeds sequentially from analysis of form to analysis of meaning. In the context of morphological processing, this implies that morphemes are processed as units of form prior to any influence of their meanings. Some interpret the apparent absence of differences in recognition latencies to targets (SNEAK) in form and semantically similar (sneaky-SNEAK) and in form similar and semantically dissimilar (sneaker-SNEAK) prime contexts at a stimulus onset asynchrony (SOA) of 48 ms as consistent with this claim. To determine the time course over which degree of semantic similarity between morphologically structured primes and their targets influences recognition in the forward masked priming variant of the lexical decision paradigm, we compared facilitation for the same targets after semantically similar and dissimilar primes across a range of SOAs (34–100 ms). The effect of shared semantics on recognition latency increased linearly with SOA when long SOAs were intermixed (Experiments 1A and 1B) and latencies were significantly faster after semantically similar than dissimilar primes at homogeneous SOAs of 48 ms (Experiment 2) and 34 ms (Experiment 3). Results limit the scope of form-then-semantics models of recognition and demonstrate that semantics influences even the very early stages of recognition. Finally, once general performance across trials has been accounted for, we fail to provide evidence for individual differences in morphological processing that can be linked to measures of reading proficiency

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

White Rose Research Online