4 research outputs found
Experiments on predictability of word in context and information rate in natural language
Based on data from a large-scale experiment with human subjects, we conclude that the logarithm of probability to guess a word in context (unpredictability) depends linearly on the word length. This result holds both for poetry and prose, even though with prose, the subjects don't know the length of the omitted word. We hypothesize that this effect reflects a tendency of natural language to have an even information rate
Zipf's Law and Avoidance of Excessive Synonymy
Zipf's law states that if words of language are ranked in the order of
decreasing frequency in texts, the frequency of a word is inversely
proportional to its rank. It is very robust as an experimental observation, but
to date it escaped satisfactory theoretical explanation. We suggest that Zipf's
law may arise from the evolution of word semantics dominated by expansion of
meanings and competition of synonyms.Comment: 47 pages; fixed reference list missing in v.