19 research outputs found
Edge crossings in linear arrangements: from theory to algorithms and applications
Còmput del nombre de creuaments C en un graf quan els vèrtexos estan distribuïts en arranjaments lineals, del valor exacte de la variància de C en arranjaments lineals uniformement aleatoris. Anàlisi de propietats estadístiques en grafs aleatoris, i aplicació en arbres de dependències sintàctiques
Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad).Peer ReviewedPostprint (published version
Edge crossings in random linear arrangements
In spatial networks vertices are arranged in some space and edges may cross.
When arranging vertices in a 1-dimensional lattice edges may cross when drawn
above the vertex sequence as it happens in linguistic and biological networks.
Here we investigate the general of problem of the distribution of edge
crossings in random arrangements of the vertices. We generalize the existing
formula for the expectation of this number in random linear arrangements of
trees to any network and derive an expression for the variance of the number of
crossings in an arbitrary layout relying on a novel characterization of the
algebraic structure of that variance in an arbitrary space. We provide compact
formulae for the expectation and the variance in complete graphs, complete
bipartite graphs, cycle graphs, one-regular graphs and various kinds of trees
(star trees, quasi-star trees and linear trees). In these networks, the scaling
of expectation and variance as a function of network size is asymptotically
power-law-like in random linear arrangements. Our work paves the way for
further research and applications in 1-dimension or investigating the
distribution of the number of crossings in lattices of higher dimension or
other embeddings.Comment: Generalised our theory from one-dimensional layouts to practically
any type of layout. This helps study the variance of the number of crossings
in graphs when their vertices are arranged on the surface of a sphere, or on
the plane. Moreover, we also give closed formulae for this variance on
particular types of graphs in both linear arrangements and general layout
Fast calculation of the variance of edge crossings
The crossing number, i.e. the minimum number of edge crossings arising when
drawing a graph on a certain surface, is a very important problem of graph
theory. The opposite problem, i.e. the maximum crossing number, is receiving
growing attention. Here we consider a complementary problem of the distribution
of the number of edge crossings, namely the variance of the number of
crossings, when embedding the vertices of an arbitrary graph in some space at
random. In his pioneering research, Moon derived that variance on random linear
arrangements of complete unipartite and bipartite graphs. Given the need of
efficient algorithms to support this sort of research and given also the
growing interest of the number of edge crossings in spatial networks, networks
where vertices are embedded in some space, here we derive algorithms to
calculate the variance in arbitrary graphs in -time, and in forests in
-time. These algorithms work on a wide range of random layouts (not only
on Moon's) and are based on novel arithmetic expressions for the calculation of
the variance that we develop from previous theoretical work. This paves the way
for many applications that rely on a fast but exact calculation of the
variance.Comment: Better connection with graph theory (crossing number). Introduction
and discussion substantially rewritten. Minor corrections in other parts of
the articl
Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
The syntactic structure of a sentence is often represented using syntactic
dependency trees. The sum of the distances between syntactically related words
has been in the limelight for the past decades. Research on dependency
distances led to the formulation of the principle of dependency distance
minimization whereby words in sentences are ordered so as to minimize that sum.
Numerous random baselines have been defined to carry out related quantitative
studies on languages. The simplest random baseline is the expected value of the
sum in unconstrained random permutations of the words in the sentence, namely
when all the shufflings of the words of a sentence are allowed and equally
likely. Here we focus on a popular baseline: random projective permutations of
the words of the sentence, that is, permutations where the syntactic dependency
structure is projective, a formal constraint that sentences satisfy often in
languages. Thus far, the expectation of the sum of dependency distances in
random projective shufflings of a sentence has been estimated approximately
with a Monte Carlo procedure whose cost is of the order of , where is
the number of words of the sentence and is the number of samples; the
larger , the lower the error of the estimation but the larger the time cost.
Here we present formulae to compute that expectation without error in time of
the order of . Furthermore, we show that star trees maximize it, and devise
a dynamic programming algorithm to retrieve the trees that minimize it
Linear-time calculation of the expected sum of edge lengths in planar linearizations of trees
Dependency graphs have proven to be a very successful model to represent the
syntactic structure of sentences of human languages. In these graphs, widely
accepted to be trees, vertices are words and arcs connect
syntactically-dependent words. The tendency of these dependencies to be short
has been demonstrated using random baselines for the sum of the lengths of the
edges or its variants. A ubiquitous baseline is the expected sum in projective
orderings (wherein edges do not cross and the root word of the sentence is not
covered by any edge). It was shown that said expected value can be computed in
time. In this article we focus on planar orderings (where the root word
can be covered) and present two main results. First, we show the relationship
between the expected sum in planar arrangements and the expected sum in
projective arrangements. Second, we also derive a -time algorithm to
calculate the expected value of the sum of edge lengths. These two results stem
from another contribution of the present article, namely a characterization of
planarity that, given a sentence, yields either the number of planar
permutations or an efficient algorithm to generate uniformly random planar
permutations of the words. Our research paves the way for replicating past
research on dependency distance minimization using random planar linearizations
as random baseline.Comment: Updated with comments from a colleagu
Reappraising the distribution of the number of edge crossings of graphs on a sphere
Many real transportation and mobility networks have their vertices placed on
the surface of the Earth. In such embeddings, the edges laid on that surface
may cross. In his pioneering research, Moon analyzed the distribution of the
number of crossings on complete graphs and complete bipartite graphs whose
vertices are located uniformly at random on the surface of a sphere assuming
that vertex placements are independent from each other. Here we revise his
derivation of that variance in the light of recent theoretical developments on
the variance of crossings and computer simulations. We show that Moon's
formulae are inaccurate in predicting the true variance and provide exact
formulae.Comment: Corrected mistakes in equation 31. Added new figure (7). Added
acknowledgements to J. W. Moon. Other minor changes. Updated figures. Minor
changes in the last updat
The linear arrangement library: A new tool for research on syntactic dependency structures
The new and growing field of Quantitative Dependency Syntax has emerged at the crossroads between Dependency Syntax and Quantitative Linguistics. One of the main concerns in this field is the statistical patterns of syntactic dependency structures. These structures, grouped in treebanks, are the source for statistical analyses in these and related areas; dozens of scores devised over the years are the tools of a new industry to search for patterns and perform other sorts of analyses. The plethora of such metrics and their increasing complexity require sharing the source code of the programs used to perform such analyses. However, such code is not often shared with the scientific community or is tested following unknown standards. Here we present a new open-source tool, the Linear Arrangement Library (LAL), which caters to the needs of, especially, inexperienced programmers. This tool enables the calculation of these metrics on single syntactic dependency structures, treebanks, and collection of treebanks, grounded on ease of use and yet with great flexibility. LAL has been designed to be efficient, easy to use (while satisfying the needs of all levels of programming expertise), reliable (thanks to thorough testing), and to unite research from different traditions, geographic areas, and research fields.LAP is supported by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the Social European Fund. RFC and LAP are supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economía, Industria y Competitividad). RFC is also supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). JLE is funded by the grant PID2019-109137GB-C22 from MINECO.Peer ReviewedPostprint (published version
Minimum projective linearizations of trees in linear time
The Minimum Linear Arrangement problem (MLA) consists of finding a mapping
from vertices of a graph to distinct integers that minimizes
. In that setting, vertices are often
assumed to lie on a horizontal line and edges are drawn as semicircles above
said line. For trees, various algorithms are available to solve the problem in
polynomial time in . There exist variants of the MLA in which the
arrangements are constrained. Iordanskii, and later Hochberg and Stallmann
(HS), put forward -time algorithms that solve the problem when
arrangements are constrained to be planar (also known as one-page book
embeddings). We also consider linear arrangements of rooted trees that are
constrained to be projective (planar embeddings where the root is not covered
by any edge). Gildea and Temperley (GT) sketched an algorithm for projective
arrangements which they claimed runs in but did not provide any
justification of its cost. In contrast, Park and Levy claimed that GT's
algorithm runs in where is the maximum degree but
did not provide sufficient detail. Here we correct an error in HS's algorithm
for the planar case, show its relationship with the projective case, and derive
simple algorithms for the projective and planar cases that run undoubtlessly in
-time.Comment: Improved connection with previous Iordanskii's work
The expected sum of edge lengths in planar linearizations of trees
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntacticallydependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or their variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time O(n). Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a O(n)-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline