253 research outputs found
The sum of edge lengths in random linear arrangements
Spatial networks are networks where nodes are located in a space equipped
with a metric. Typically, the space is two-dimensional and until recently and
traditionally, the metric that was usually considered was the Euclidean
distance. In spatial networks, the cost of a link depends on the edge length,
i.e. the distance between the nodes that define the edge. Hypothesizing that
there is pressure to reduce the length of the edges of a network requires a
null model, e.g., a random layout of the vertices of the network. Here we
investigate the properties of the distribution of the sum of edge lengths in
random linear arrangement of vertices, that has many applications in different
fields. A random linear arrangement consists of an ordering of the elements of
the nodes of a network being all possible orderings equally likely. The
distance between two vertices is one plus the number of intermediate vertices
in the ordering. Compact formulae for the 1st and 2nd moments about zero as
well as the variance of the sum of edge lengths are obtained for arbitrary
graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi
graphs and its scaling in uniformly random trees. Various developments and
applications for future research are suggested
Memory limitations are hidden in grammar
[Abstract] The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent
of non-linguistic cognitive constraints
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal Associations
We present a quantitative analysis of human word association pairs and study
the types of relations presented in the associations. We put our main focus on
the correlation between response types and respondent characteristics such as
occupation and gender by contrasting syntagmatic and paradigmatic associations.
Finally, we propose a personalised distributed word association model and show
the importance of incorporating demographic factors into the models commonly
used in natural language processing.Comment: AIST 2017 camera-read
Exponential and power laws in public procurement markets
For the first time ever, we analyze a unique public procurement database,
which includes information about a number of bidders for a contract, a final
price, an identification of a winner and an identification of a contracting
authority for each of more than 40,000 public procurements in the Czech
Republic between 2006 and 2011, focusing on the distributional properties of
the variables of interest. We uncover several scaling laws -- the exponential
law for the number of bidders, and the power laws for the total revenues and
total spendings of the participating companies, which even follows the Zipf's
law for the 100 most spending institutions. We propose an analogy between
extensive and non-extensive systems in physics and the public procurement
market situations. Through an entropy maximization, such the analogy yields
some interesting results and policy implications with respect to the
Maxwell-Boltzmann and Pareto distributions in the analyzed quantities.Comment: 6 pages, 3 figure
Distinguishing quantitative parameters of author’s language and style (a case of Ivan Franko long prose fiction)
The article is dedicated to precise analysis of distinguishing quantitative parameters of author’s language and style. Such an analysis is made for Ivan Franko long prose fiction for the first time. The frequency dictionary of all nine Ukrainian novels by Ivan Franko was compiled on the material of an electronic text corpus with an external and internal markup. It can be considered as a statistical combinatory model of Franko’s style as well as a lingual statistical portrait of his long prose fiction. The following parameters were obtained: vocabulary sizes, variety, exclusiveness, concentration indexes, the amount of hapax legomena, their occupation of text and vocabulary, amount of words in text with frequency 10 and higher, their occupation of text and vocabulary. They were compared with those of text corpus of Ukrainian general long prose fiction
The placement of the head that maximizes predictability. An information theoretic approach
The minimization of the length of syntactic dependencies is a
well-established principle of word order and the basis of a mathematical theory
of word order. Here we complete that theory from the perspective of information
theory, adding a competing word order principle: the maximization of
predictability of a target element. These two principles are in conflict: to
maximize the predictability of the head, the head should appear last, which
maximizes the costs with respect to dependency length minimization. The
implications of such a broad theoretical framework to understand the
optimality, diversity and evolution of the six possible orderings of subject,
object and verb are reviewed.Comment: in press in Glottometric
Anti dependency distance minimization in short sequences: A graph theoretic approach
Dependency distance minimization (DDm) is a word order principle favouring the placement of syntactically related words close to each other in sentences. Massive evidence of the principle has been reported for more than a decade with the help of syntactic dependency treebanks where long sentences abound. However, it has been predicted theoretically that the principle is more likely to be beaten in short sequences by the principle of surprisal minimization (predictability maximization). Here we introduce a simple binomial test to verify such a hypothesis. In short sentences, we find anti-DDm for some languages from different families. Our analysis of the syntactic dependency structures suggests that anti-DDm is produced by star trees.Peer ReviewedPostprint (author's final draft
- …