1,153 research outputs found
The optimality of syntactic dependency distances
It is often stated that human languages, as other biological systems, are
shaped by cost-cutting pressures but, to what extent? Attempts to quantify the
degree of optimality of languages by means of an optimality score have been
scarce and focused mostly on English. Here we recast the problem of the
optimality of the word order of a sentence as an optimization problem on a
spatial network where the vertices are words, arcs indicate syntactic
dependencies and the space is defined by the linear order of the words in the
sentence. We introduce a new score to quantify the cognitive pressure to reduce
the distance between linked words in a sentence. The analysis of sentences from
93 languages representing 19 linguistic families reveals that half of languages
are optimized to a 70% or more. The score indicates that distances are not
significantly reduced in a few languages and confirms two theoretical
predictions, i.e. that longer sentences are more optimized and that distances
are more likely to be longer than expected by chance in short sentences. We
present a new hierarchical ranking of languages by their degree of
optimization. The statistical advantages of the new score call for a
reevaluation of the evolution of dependency distance over time in languages as
well as the relationship between dependency distance and linguistic competence.
Finally, the principles behind the design of the score can be extended to
develop more powerful normalizations of topological distances or physical
distances in more dimensions
A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)
In a recent article, Christiansen and Chater (2016) present a fundamental
constraint on language, i.e. a now-or-never bottleneck that arises from our
fleeting memory, and explore its implications, e.g., chunk-and-pass processing,
outlining a framework that promises to unify different areas of research. Here
we explore additional support for this constraint and suggest further
connections from quantitative linguistics and information theory
Optimality of syntactic dependency distances
It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies, and the space is defined by the linear order of the words in the sentence. We introduce a score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two theoretical predictions: that longer sentences are more optimized and that distances are more likely to be longer than expected by chance in short sentences. We present a hierarchical ranking of languages by their degree of optimization. The score has implications for various fields of language research (dependency linguistics, typology, historical linguistics, clinical linguistics, and cognitive science). Finally, the principles behind the design of the score have implications for network science.Peer ReviewedPostprint (published version
Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
The syntactic structure of a sentence is often represented using syntactic
dependency trees. The sum of the distances between syntactically related words
has been in the limelight for the past decades. Research on dependency
distances led to the formulation of the principle of dependency distance
minimization whereby words in sentences are ordered so as to minimize that sum.
Numerous random baselines have been defined to carry out related quantitative
studies on languages. The simplest random baseline is the expected value of the
sum in unconstrained random permutations of the words in the sentence, namely
when all the shufflings of the words of a sentence are allowed and equally
likely. Here we focus on a popular baseline: random projective permutations of
the words of the sentence, that is, permutations where the syntactic dependency
structure is projective, a formal constraint that sentences satisfy often in
languages. Thus far, the expectation of the sum of dependency distances in
random projective shufflings of a sentence has been estimated approximately
with a Monte Carlo procedure whose cost is of the order of , where is
the number of words of the sentence and is the number of samples; the
larger , the lower the error of the estimation but the larger the time cost.
Here we present formulae to compute that expectation without error in time of
the order of . Furthermore, we show that star trees maximize it, and devise
a dynamic programming algorithm to retrieve the trees that minimize it
Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad).Peer ReviewedPostprint (published version
The placement of the head that minimizes online memory: a complex systems approach
It is well known that the length of a syntactic dependency determines its
online memory cost. Thus, the problem of the placement of a head and its
dependents (complements or modifiers) that minimizes online memory is
equivalent to the problem of the minimum linear arrangement of a star tree.
However, how that length is translated into cognitive cost is not known. This
study shows that the online memory cost is minimized when the head is placed at
the center, regardless of the function that transforms length into cost,
provided only that this function is strictly monotonically increasing. Online
memory defines a quasi-convex adaptive landscape with a single central minimum
if the number of elements is odd and two central minima if that number is even.
We discuss various aspects of the dynamics of word order of subject (S), verb
(V) and object (O) from a complex systems perspective and suggest that word
orders tend to evolve by swapping adjacent constituents from an initial or
early SOV configuration that is attracted towards a central word order by
online memory minimization. We also suggest that the stability of SVO is due to
at least two factors, the quasi-convex shape of the adaptive landscape in the
online memory dimension and online memory adaptations that avoid regression to
SOV. Although OVS is also optimal for placing the verb at the center, its low
frequency is explained by its long distance to the seminal SOV in the
permutation space.Comment: Minor changes (language improved; typos in Eqs. 5, 6 and 13
corrected
The optimality of word lengths. Theoretical foundations and an empirical study
Zipf's law of abbreviation, namely the tendency of more frequent words to be
shorter, has been viewed as a manifestation of compression, i.e. the
minimization of the length of forms -- a universal principle of natural
communication. Although the claim that languages are optimized has become
trendy, attempts to measure the degree of optimization of languages have been
rather scarce. Here we present two optimality scores that are dualy normalized,
namely, they are normalized with respect to both the minimum and the random
baseline. We analyze the theoretical and statistical pros and cons of these and
other scores. Harnessing the best score, we quantify for the first time the
degree of optimality of word lengths in languages. This indicates that
languages are optimized to 62 or 67 percent on average (depending on the
source) when word lengths are measured in characters, and to 65 percent on
average when word lengths are measured in time. In general, spoken word
durations are more optimized than written word lengths in characters. Our work
paves the way to measure the degree of optimality of the vocalizations or
gestures of other species, and to compare them against written, spoken, or
signed human languages.Comment: On the one hand, the article has been reduced: analyses of the law of
abbreviation and some of the methods have been moved to another article;
appendix B has been reduced. On the other hand, various parts have been
rewritten for clarity; new figures have been added to ease the understanding
of the scores; new citations added. Many typos have been correcte
Memory limitations are hidden in grammar
[Abstract] The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent
of non-linguistic cognitive constraints
- …