1,402 research outputs found

    Crossings as a side effect of dependency lengths

    Get PDF
    The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

    Some limits of standard linguistic typology: the case of Cysouw's models for the frequencies of the six possible orderings of S, V and O

    Get PDF
    This article is a critical analysis of Michael Cysouw's comment "Linear order as a predictor of word order regularities".Peer ReviewedPostprint (author's final draft

    Hidden communication aspects in the exponent of Zipf's law

    Get PDF
    This article focuses on communication systems following Zipf’s law, in a study of the rel-ationship between the properties of those communication systems and the exponent of the law. The properties of communication systems are described using quantitative measures of semantic vagueness and the cost of word use. The precision and the economy of a communication system is reduced to a function of the exponent of Zipf’s law and the size of the communication system. Taking the exponent of the frequency spectrum, it is demonstrated that semantic precision grows with the exponent, where-as the cost of word use reaches a global minimum between 1.5 and 2, if the size of the communication system remains constant. The exponent of Zipf’s law is shown to be a key aspect for knowing about the number of stimuli handled by a communication system, and determining which of two systems is less vague or less expensive. The ideal exponent of Zipf’s law, it is therefore argued, should be very slightly above 2.Peer ReviewedPostprint (published version

    Euclidean distance between syntactically linked words

    Get PDF
    We study the Euclidean distance between syntactically linked words in sentences. The average distance is significantly small and is a very slowly growing function of sentence length. We consider two nonexcluding hypotheses: (a) the average distance is minimized and (b) the average distance is constrained. Support for (a) comes from the significantly small average distance real sentences achieve. The strength of the minimization hypothesis decreases with the length of the sentence. Support for (b) comes from the very slow growth of the average distance versus sentence length. Furthermore, (b) predicts, under ideal conditions, an exponential distribution of the distance between linked words, a trend that can be identified in real sentences.Peer ReviewedPostprint (published version

    When language breaks into pieces: A conflict between communication through isolated signals and language

    Get PDF
    Here we study a communication model where signals associate to stimuli. The model assumes that signals follow Zipf’s law and the exponent of the law depends on a balance between maximizing the information transfer and saving cost the cost of communication. We study the effect of tuning that balance on the structure of signal-stimulus associations. The model starts from two recent results. First, the exponent grows as the weight of information transfer increases. Second, a rudimentary form of language is obtained when the network of signal-stimulus associations is almost connected. Here we show the existence of a sudden destruction of language once a critical balance is crossed. The model shows that maximizing the information transfer through isolated signals and language are in conflict. The model proposes a strong reason for not finding large exponents in complex communication systems: language is in danger. Besides, the findings suggests that human words may need to be ambiguous to keep language alive. Interestingly, the model predicts that large exponents should be associated to decreased synaptic density. It is not surprising that the largest exponents correspond to schizophrenic patients since, according to the spirit of Feinberg’s hypothesis, decreased synaptic density leads to schizophrenia. Our findings suggests that the exponent of Zipf’s law is intimately related to language and that it could be used to detect anomalous structure and organization of the brain.Peer ReviewedPostprint (published version

    Decoding least effort and scaling in signal frequency distributions

    Get PDF
    Here, assuming a general communication model where objects map to signals, a power function for the distribution of signal frequencies is derived. The model relies on the satisfaction of the receiver (hearer) communicative needs when the entropy of the number of objects per signal is maximized. Evidence of power distributions in a linguistic context (some of them with exponents clearly different from the typical ß ˜ 2 of Zipf's law) is reviewed and expanded. We support the view that Zipf's law reflects some sort of optimization but following a novel realistic approach where signals (e.g. words) are used according to the objects (e.g. meanings) they are linked to. Our results strongly suggest that many systems in nature use non-trivial strategies for easing the interpretation of a signal. Interestingly, constraining just the number of interpretations of signals does not lead to scaling.Peer ReviewedPostprint (author's final draft

    The optimality of attaching unlinked labels to unlinked meanings

    Get PDF
    Vocabulary learning by children can be characterized by many biases. When encountering a new word, children as well as adults, are biased towards assuming that it means something totally different from the words that they already know. To the best of our knowledge, the 1st mathematical proof of the optimality of this bias is presented here. First, it is shown that this bias is a particular case of the maximization of mutual information between words and meanings. Second, the optimality is proven within a more general information theoretic framework where mutual information maximization competes with other information theoretic principles. The bias is a prediction from modern information theory. The relationship between information theoretic principles and the principles of contrast and mutual exclusivity is also shown.Peer ReviewedPostprint (published version

    Some word order biases from limited brain resources: A mathematical approach

    Get PDF
    In this paper, we propose a mathematical framework for studying word order optimization. The framework relies on the well-known positive correlation between cognitive cost and the Euclidean distance between the elements (e.g. words) involved in a syntactic link. We study the conditions under which a certain word order is more economical than an alternative word order by proposing a mathematical approach. We apply our methodology to two different cases: (a) the ordering of subject (S), verb (V) and object (O), and (b) the covering of a root word by a syntactic link. For the former, we find that SVO and its symmetric, OVS, are more economical than OVS, SOV, VOS and VSO at least 2/3 of the time. For the latter, we find that uncovering the root word is more economical than covering it at least 1/2 of the time. With the help of our framework, one can explain some Greenbergian universals. Our findings provide further theoretical support for the hypothesis that the limited resources of the brain introduce biases toward certain word orders. Our theoretical findings could inspire or illuminate future psycholinguistics or corpus linguistics studies.Peer ReviewedPostprint (author's final draft

    Why do syntactic links not cross?

    Get PDF
    Here we study the arrangement of vertices of trees in a 1-dimensional Euclidean space when the Euclidean distance between linked vertices is minimized. We conclude that links are unlikely to cross when drawn over the vertex sequence. This finding suggests that the uncommonness of crossings in the trees specifying the syntactic structure of sentences could be a side-effect of minimizing the Euclidean distance between syntactically related words. As far as we know, nobody has provided a successful explanation of such a surprisingly universal feature of languages that was discovered in the 60s of the past century by Hays and Lecerf. On the one hand, support for the role of distance minimization in avoiding edge crossings comes from statistical studies showing that the Euclidean distance between syntactically linked words of real sentences is minimized or constrained to a small value. On the other hand, that distance is considered a measure of the cost of syntactic relationships in various frameworks. By cost, we mean the amount of computational resources needed by the brain. The absence of crossings in syntactic trees may be universal just because all human brains have limited resources.Peer ReviewedPostprint (author's final draft

    The sum of edge lengths in random linear arrangements

    Get PDF
    Spatial networks are networks where nodes are located in a space equipped with a metric. Typically, the space is two-dimensional and until recently and traditionally, the metric that was usually considered was the Euclidean distance. In spatial networks, the cost of a link depends on the edge length, i.e. the distance between the nodes that define the edge. Hypothesizing that there is pressure to reduce the length of the edges of a network requires a null model, e.g., a random layout of the vertices of the network. Here we investigate the properties of the distribution of the sum of edge lengths in random linear arrangement of vertices, that has many applications in different fields. A random linear arrangement consists of an ordering of the elements of the nodes of a network being all possible orderings equally likely. The distance between two vertices is one plus the number of intermediate vertices in the ordering. Compact formulae for the 1st and 2nd moments about zero as well as the variance of the sum of edge lengths are obtained for arbitrary graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi graphs and its scaling in uniformly random trees. Various developments and applications for future research are suggested
    • …
    corecore