16 research outputs found
The risks of mixing dependency lengths from sequences of different length
Mixing dependency lengths from sequences of different length is a common
practice in language research. However, the empirical distribution of
dependency lengths of sentences of the same length differs from that of
sentences of varying length and the distribution of dependency lengths depends
on sentence length for real sentences and also under the null hypothesis that
dependencies connect vertices located in random positions of the sequence. This
suggests that certain results, such as the distribution of syntactic dependency
lengths mixing dependencies from sentences of varying length, could be a mere
consequence of that mixing. Furthermore, differences in the global averages of
dependency length (mixing lengths from sentences of varying length) for two
different languages do not simply imply a priori that one language optimizes
dependency lengths better than the other because those differences could be due
to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have
been correcte
Beyond description. Comment on "Approaching human language with complex networks" by Cong & Liu
Comment on "Approaching human language with complex networks" by Cong & Li
A commentary on "The now-or-never bottleneck: a fundamental constraint on language", by Christiansen and Chater (2016)
In a recent article, Christiansen and Chater (2016) present a fundamental
constraint on language, i.e. a now-or-never bottleneck that arises from our
fleeting memory, and explore its implications, e.g., chunk-and-pass processing,
outlining a framework that promises to unify different areas of research. Here
we explore additional support for this constraint and suggest further
connections from quantitative linguistics and information theory
Towards a theory of word order: comment on "Dependency distance: a new perspective on syntactic patterns in natural language" by Haitao Liu et al.
Peer ReviewedPostprint (author's final draft
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
The placement of the head that minimizes online memory: a complex systems approach
It is well known that the length of a syntactic dependency determines its
online memory cost. Thus, the problem of the placement of a head and its
dependents (complements or modifiers) that minimizes online memory is
equivalent to the problem of the minimum linear arrangement of a star tree.
However, how that length is translated into cognitive cost is not known. This
study shows that the online memory cost is minimized when the head is placed at
the center, regardless of the function that transforms length into cost,
provided only that this function is strictly monotonically increasing. Online
memory defines a quasi-convex adaptive landscape with a single central minimum
if the number of elements is odd and two central minima if that number is even.
We discuss various aspects of the dynamics of word order of subject (S), verb
(V) and object (O) from a complex systems perspective and suggest that word
orders tend to evolve by swapping adjacent constituents from an initial or
early SOV configuration that is attracted towards a central word order by
online memory minimization. We also suggest that the stability of SVO is due to
at least two factors, the quasi-convex shape of the adaptive landscape in the
online memory dimension and online memory adaptations that avoid regression to
SOV. Although OVS is also optimal for placing the verb at the center, its low
frequency is explained by its long distance to the seminal SOV in the
permutation space.Comment: Minor changes (language improved; typos in Eqs. 5, 6 and 13
corrected
Anti dependency distance minimization in short sequences: A graph theoretic approach
Dependency distance minimization (DDm) is a word order principle favouring the placement of syntactically related words close to each other in sentences. Massive evidence of the principle has been reported for more than a decade with the help of syntactic dependency treebanks where long sentences abound. However, it has been predicted theoretically that the principle is more likely to be beaten in short sequences by the principle of surprisal minimization (predictability maximization). Here we introduce a simple binomial test to verify such a hypothesis. In short sentences, we find anti-DDm for some languages from different families. Our analysis of the syntactic dependency structures suggests that anti-DDm is produced by star trees.Peer ReviewedPostprint (author's final draft
Are crossing dependencies really scarce?
The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.Peer ReviewedPostprint (author's final draft