Search CORE

1,153 research outputs found

The optimality of syntactic dependency distances

Author: Alemany-Puig Lluís
Esteban Juan Luis
Ferrer-i-Cancho Ramon
Gómez-Rodríguez Carlos
Publication venue
Publication date: 30/07/2020
Field of study

It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies and the space is defined by the linear order of the words in the sentence. We introduce a new score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two theoretical predictions, i.e. that longer sentences are more optimized and that distances are more likely to be longer than expected by chance in short sentences. We present a new hierarchical ranking of languages by their degree of optimization. The statistical advantages of the new score call for a reevaluation of the evolution of dependency distance over time in languages as well as the relationship between dependency distance and linguistic competence. Finally, the principles behind the design of the score can be extended to develop more powerful normalizations of topological distances or physical distances in more dimensions

arXiv.org e-Print Archive

A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)

Author: Ferrer-i-Cancho Ramon
Publication venue
Publication date: 01/01/2017
Field of study

In a recent article, Christiansen and Chater (2016) present a fundamental constraint on language, i.e. a now-or-never bottleneck that arises from our fleeting memory, and explore its implications, e.g., chunk-and-pass processing, outlining a framework that promises to unify different areas of research. Here we explore additional support for this constraint and suggest further connections from quantitative linguistics and information theory

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Optimality of syntactic dependency distances

Author: Alemany Puig Lluís
Esteban Ángeles Juan Luis
Ferrer Cancho Ramon
Gómez Rodríguez Carlos
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2022
Field of study

It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies, and the space is defined by the linear order of the words in the sentence. We introduce a score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two theoretical predictions: that longer sentences are more optimized and that distances are more likely to be longer than expected by chance in short sentences. We present a hierarchical ranking of languages by their degree of optimization. The score has implications for various fields of language research (dependency linguistics, typology, historical linguistics, clinical linguistics, and cognitive science). Finally, the principles behind the design of the score have implications for network science.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees

Author: Alemany-Puig Lluís
Ferrer-i-Cancho Ramon
Publication venue
Publication date: 07/07/2021
Field of study

Zn

, where

n

is the number of words of the sentence and

Z

is the number of samples; the larger

Z

, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of

n

. Furthermore, we show that star trees maximize it, and devise a dynamic programming algorithm to retrieve the trees that minimize it

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees

Author: Alemany Puig Lluís
Ferrer Cancho Ramon
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2022
Field of study

The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

The placement of the head that minimizes online memory: a complex systems approach

Author: Ferrer-i-Cancho Ramon
Publication venue: 'Brill'
Publication date: 01/01/2015
Field of study

It is well known that the length of a syntactic dependency determines its online memory cost. Thus, the problem of the placement of a head and its dependents (complements or modifiers) that minimizes online memory is equivalent to the problem of the minimum linear arrangement of a star tree. However, how that length is translated into cognitive cost is not known. This study shows that the online memory cost is minimized when the head is placed at the center, regardless of the function that transforms length into cost, provided only that this function is strictly monotonically increasing. Online memory defines a quasi-convex adaptive landscape with a single central minimum if the number of elements is odd and two central minima if that number is even. We discuss various aspects of the dynamics of word order of subject (S), verb (V) and object (O) from a complex systems perspective and suggest that word orders tend to evolve by swapping adjacent constituents from an initial or early SOV configuration that is attracted towards a central word order by online memory minimization. We also suggest that the stability of SVO is due to at least two factors, the quasi-convex shape of the adaptive landscape in the online memory dimension and online memory adaptations that avoid regression to SOV. Although OVS is also optimal for placing the verb at the center, its low frequency is explained by its long distance to the seminal SOV in the permutation space.Comment: Minor changes (language improved; typos in Eqs. 5, 6 and 13 corrected

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The optimality of word lengths. Theoretical foundations and an empirical study

Author: Bentz Christian
Casas-i-Muñoz Antoni
Cluet-i-Martinell Jordi
Ferrer-i-Cancho Ramon
Petrini Sonia
Wang Mengxue
Publication venue
Publication date: 05/04/2023
Field of study

Zipf's law of abbreviation, namely the tendency of more frequent words to be shorter, has been viewed as a manifestation of compression, i.e. the minimization of the length of forms -- a universal principle of natural communication. Although the claim that languages are optimized has become trendy, attempts to measure the degree of optimization of languages have been rather scarce. Here we present two optimality scores that are dualy normalized, namely, they are normalized with respect to both the minimum and the random baseline. We analyze the theoretical and statistical pros and cons of these and other scores. Harnessing the best score, we quantify for the first time the degree of optimality of word lengths in languages. This indicates that languages are optimized to 62 or 67 percent on average (depending on the source) when word lengths are measured in characters, and to 65 percent on average when word lengths are measured in time. In general, spoken word durations are more optimized than written word lengths in characters. Our work paves the way to measure the degree of optimality of the vocalizations or gestures of other species, and to compare them against written, spoken, or signed human languages.Comment: On the one hand, the article has been reduced: analyses of the law of abbreviation and some of the methods have been moved to another article; appendix B has been reduced. On the other hand, various parts have been rewritten for clarity; new figures have been added to ease the understanding of the scores; new citations added. Many typos have been correcte

arXiv.org e-Print Archive

Memory limitations are hidden in grammar

Author: Christiansen Morten H.
Ferrer-i-Cancho Ramon
Gómez-Rodríguez Carlos
Publication venue: RAM-Verlag
Publication date: 01/01/2022
Field of study

[Abstract] The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic cognitive constraints

Repositorio da Universidade da Coruña