253 research outputs found

    The sum of edge lengths in random linear arrangements

    Get PDF
    Spatial networks are networks where nodes are located in a space equipped with a metric. Typically, the space is two-dimensional and until recently and traditionally, the metric that was usually considered was the Euclidean distance. In spatial networks, the cost of a link depends on the edge length, i.e. the distance between the nodes that define the edge. Hypothesizing that there is pressure to reduce the length of the edges of a network requires a null model, e.g., a random layout of the vertices of the network. Here we investigate the properties of the distribution of the sum of edge lengths in random linear arrangement of vertices, that has many applications in different fields. A random linear arrangement consists of an ordering of the elements of the nodes of a network being all possible orderings equally likely. The distance between two vertices is one plus the number of intermediate vertices in the ordering. Compact formulae for the 1st and 2nd moments about zero as well as the variance of the sum of edge lengths are obtained for arbitrary graphs and trees. We also analyze the evolution of that variance in Erdos-Renyi graphs and its scaling in uniformly random trees. Various developments and applications for future research are suggested

    Memory limitations are hidden in grammar

    Get PDF
    [Abstract] The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic cognitive constraints

    Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal Associations

    Full text link
    We present a quantitative analysis of human word association pairs and study the types of relations presented in the associations. We put our main focus on the correlation between response types and respondent characteristics such as occupation and gender by contrasting syntagmatic and paradigmatic associations. Finally, we propose a personalised distributed word association model and show the importance of incorporating demographic factors into the models commonly used in natural language processing.Comment: AIST 2017 camera-read

    Exponential and power laws in public procurement markets

    Full text link
    For the first time ever, we analyze a unique public procurement database, which includes information about a number of bidders for a contract, a final price, an identification of a winner and an identification of a contracting authority for each of more than 40,000 public procurements in the Czech Republic between 2006 and 2011, focusing on the distributional properties of the variables of interest. We uncover several scaling laws -- the exponential law for the number of bidders, and the power laws for the total revenues and total spendings of the participating companies, which even follows the Zipf's law for the 100 most spending institutions. We propose an analogy between extensive and non-extensive systems in physics and the public procurement market situations. Through an entropy maximization, such the analogy yields some interesting results and policy implications with respect to the Maxwell-Boltzmann and Pareto distributions in the analyzed quantities.Comment: 6 pages, 3 figure

    Distinguishing quantitative parameters of author’s language and style (a case of Ivan Franko long prose fiction)

    Get PDF
    The article is dedicated to precise analysis of distinguishing quantitative parameters of author’s language and style. Such an analysis is made for Ivan Franko long prose fiction for the first time. The frequency dictionary of all nine Ukrainian novels by Ivan Franko was compiled on the material of an electronic text corpus with an external and internal markup. It can be considered as a statistical combinatory model of Franko’s style as well as a lingual statistical portrait of his long prose fiction. The following parameters were obtained: vocabulary sizes, variety, exclusiveness, concentration indexes, the amount of hapax legomena, their occupation of text and vocabulary, amount of words in text with frequency 10 and higher, their occupation of text and vocabulary. They were compared with those of text corpus of Ukrainian general long prose fiction

    The placement of the head that maximizes predictability. An information theoretic approach

    Get PDF
    The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.Comment: in press in Glottometric

    Anti dependency distance minimization in short sequences: A graph theoretic approach

    Get PDF
    Dependency distance minimization (DDm) is a word order principle favouring the placement of syntactically related words close to each other in sentences. Massive evidence of the principle has been reported for more than a decade with the help of syntactic dependency treebanks where long sentences abound. However, it has been predicted theoretically that the principle is more likely to be beaten in short sequences by the principle of surprisal minimization (predictability maximization). Here we introduce a simple binomial test to verify such a hypothesis. In short sentences, we find anti-DDm for some languages from different families. Our analysis of the syntactic dependency structures suggests that anti-DDm is produced by star trees.Peer ReviewedPostprint (author's final draft
    • …
    corecore