5,916 research outputs found
Random crossings in dependency trees
It has been hypothesized that the rather small number of crossings in real
syntactic dependency trees is a side-effect of pressure for dependency length
minimization. Here we answer a related important research question: what would
be the expected number of crossings if the natural order of a sentence was lost
and replaced by a random ordering? We show that this number depends only on the
number of vertices of the dependency tree (the sentence length) and the second
moment about zero of vertex degrees. The expected number of crossings is
minimum for a star tree (crossings are impossible) and maximum for a linear
tree (the number of crossings is of the order of the square of the sequence
length).Comment: changes of format and language; some corrections in Appendix A; in
press in Glottometric
Why do syntactic links not cross?
Here we study the arrangement of vertices of trees in a 1-dimensional Euclidean space when the Euclidean distance between linked vertices is minimized. We conclude that links are unlikely to cross when drawn over the vertex sequence. This finding suggests that the uncommonness of crossings in the trees specifying the syntactic structure of sentences could be a side-effect of minimizing the Euclidean distance between syntactically related words. As far as we know, nobody has provided a successful explanation of such a surprisingly universal feature of languages that was discovered in the 60s of the past century by Hays and Lecerf. On the one hand, support for the role of distance minimization in avoiding edge crossings comes from statistical studies showing that the Euclidean distance between syntactically linked words of real sentences is minimized or constrained to a small value. On the other hand, that distance is considered a measure of the cost of syntactic relationships in various frameworks. By cost, we mean the amount of computational resources needed by the brain. The absence of crossings in syntactic trees may be universal just because all human brains have limited resources.Peer ReviewedPostprint (author's final draft
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
The risks of mixing dependency lengths from sequences of different length
Mixing dependency lengths from sequences of different length is a common
practice in language research. However, the empirical distribution of
dependency lengths of sentences of the same length differs from that of
sentences of varying length and the distribution of dependency lengths depends
on sentence length for real sentences and also under the null hypothesis that
dependencies connect vertices located in random positions of the sequence. This
suggests that certain results, such as the distribution of syntactic dependency
lengths mixing dependencies from sentences of varying length, could be a mere
consequence of that mixing. Furthermore, differences in the global averages of
dependency length (mixing lengths from sentences of varying length) for two
different languages do not simply imply a priori that one language optimizes
dependency lengths better than the other because those differences could be due
to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have
been correcte
Weighted dependency graphs
The theory of dependency graphs is a powerful toolbox to prove asymptotic
normality of sums of random variables. In this article, we introduce a more
general notion of weighted dependency graphs and give normality criteria in
this context. We also provide generic tools to prove that some weighted graph
is a weighted dependency graph for a given family of random variables.
To illustrate the power of the theory, we give applications to the following
objects: uniform random pair partitions, the random graph model ,
uniform random permutations, the symmetric simple exclusion process and
multilinear statistics on Markov chains. The application to random permutations
gives a bivariate extension of a functional central limit theorem of Janson and
Barbour. On Markov chains, we answer positively an open question of Bourdon and
Vall\'ee on the asymptotic normality of subword counts in random texts
generated by a Markovian source.Comment: 57 pages. Third version: minor modifications, after review proces
Beyond description. Comment on "Approaching human language with complex networks" by Cong & Liu
Comment on "Approaching human language with complex networks" by Cong & Li
Non-crossing dependencies: Least effort, not grammar
The use of null hypotheses (in a statistical sense) is common in hard sciences but not in theoretical linguistics. Here the null hypothesis that the low frequency of syntactic dependency crossings is expected by an arbitrary ordering of words is rejected. It is shown that this would require star dependency structures, which are both unrealistic and too restrictive. The hypothesis of the limited resources of the human brain is revisited. Stronger null hypotheses taking into account actual dependency lengths for the likelihood of crossings are presented. Those hypotheses suggests that crossings are likely to reduce when dependencies are shortened. A hypothesis based on pressure to reduce dependency lengths is more parsimonious than a principle of minimization of crossings or a grammatical ban that is totally dissociated from the general and non-linguistic principle of economy.Postprint (author's final draft
The sum of edge lengths in random linear arrangements
Spatial networks are networks where nodes are located in a space equipped
with a metric. Typically, the space is two-dimensional and until recently and
traditionally, the metric that was usually considered was the Euclidean
distance. In spatial networks, the cost of a link depends on the edge length,
i.e. the distance between the nodes that define the edge. Hypothesizing that
there is pressure to reduce the length of the edges of a network requires a
null model, e.g., a random layout of the vertices of the network. Here we
investigate the properties of the distribution of the sum of edge lengths in
random linear arrangement of vertices, that has many applications in different
fields. A random linear arrangement consists of an ordering of the elements of
the nodes of a network being all possible orderings equally likely. The
distance between two vertices is one plus the number of intermediate vertices
in the ordering. Compact formulae for the 1st and 2nd moments about zero as
well as the variance of the sum of edge lengths are obtained for arbitrary
graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi
graphs and its scaling in uniformly random trees. Various developments and
applications for future research are suggested
- …