23 research outputs found

    Hubiness, length, crossings and their relationships in dependency trees

    Get PDF
    Here tree dependency structures are studied from three different perspectives: their degree variance (hubiness), the mean dependency length and the number of dependency crossings. Bounds that reveal pairwise dependencies among these three metrics are derived. Hubiness (the variance of degrees) plays a central role: the mean dependency length is bounded below by hubiness while the number of crossings is bounded above by hubiness. Our findings suggest that the online memory cost of a sentence might be determined not just by the ordering of words but also by the hubiness of the underlying structure. The 2nd moment of degree plays a crucial role that is reminiscent of its role in large complex networks.Peer ReviewedPostprint (published version

    Are crossing dependencies really scarce?

    Get PDF
    The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.Peer ReviewedPostprint (author's final draft

    Crossings as a side effect of dependency lengths

    Get PDF
    The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

    Non-crossing dependencies: Least effort, not grammar

    Get PDF
    The use of null hypotheses (in a statistical sense) is common in hard sciences but not in theoretical linguistics. Here the null hypothesis that the low frequency of syntactic dependency crossings is expected by an arbitrary ordering of words is rejected. It is shown that this would require star dependency structures, which are both unrealistic and too restrictive. The hypothesis of the limited resources of the human brain is revisited. Stronger null hypotheses taking into account actual dependency lengths for the likelihood of crossings are presented. Those hypotheses suggests that crossings are likely to reduce when dependencies are shortened. A hypothesis based on pressure to reduce dependency lengths is more parsimonious than a principle of minimization of crossings or a grammatical ban that is totally dissociated from the general and non-linguistic principle of economy.Postprint (author's final draft

    The risks of mixing dependency lengths from sequences of different length

    Get PDF
    Mixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length and the distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those differences could be due to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have been correcte

    The sum of edge lengths in random linear arrangements

    Get PDF
    Spatial networks are networks where nodes are located in a space equipped with a metric. Typically, the space is two-dimensional and until recently and traditionally, the metric that was usually considered was the Euclidean distance. In spatial networks, the cost of a link depends on the edge length, i.e. the distance between the nodes that define the edge. Hypothesizing that there is pressure to reduce the length of the edges of a network requires a null model, e.g., a random layout of the vertices of the network. Here we investigate the properties of the distribution of the sum of edge lengths in random linear arrangement of vertices, that has many applications in different fields. A random linear arrangement consists of an ordering of the elements of the nodes of a network being all possible orderings equally likely. The distance between two vertices is one plus the number of intermediate vertices in the ordering. Compact formulae for the 1st and 2nd moments about zero as well as the variance of the sum of edge lengths are obtained for arbitrary graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi graphs and its scaling in uniformly random trees. Various developments and applications for future research are suggested

    Anti dependency distance minimization in short sequences: A graph theoretic approach

    Get PDF
    Dependency distance minimization (DDm) is a word order principle favouring the placement of syntactically related words close to each other in sentences. Massive evidence of the principle has been reported for more than a decade with the help of syntactic dependency treebanks where long sentences abound. However, it has been predicted theoretically that the principle is more likely to be beaten in short sequences by the principle of surprisal minimization (predictability maximization). Here we introduce a simple binomial test to verify such a hypothesis. In short sentences, we find anti-DDm for some languages from different families. Our analysis of the syntactic dependency structures suggests that anti-DDm is produced by star trees.Peer ReviewedPostprint (author's final draft
    corecore