483,008 research outputs found

    Complex systems and the history of the English language

    Get PDF
    Complexity theory (Mitchell 2009, Kretzschmar 2009) is something that historical linguists not only can use but should use in order to improve the relationship between the speech we observe in historical settings and the generalizations we make from it. Complex systems, as described in physics, ecology, and many other sciences, are made up of massive numbers of components interacting with one another, and this results in self-organization and emergent order. For speech, the “components” of a complex system are all of the possible variant realizations of linguistic features as they are deployed by human agents, speakers and writers. The order that emerges in speech is simply the fact that our use of words and other linguistic features is significantly clustered in the spatial and social and textual groups in which we actually communicate. Order emerges from such systems by means of self-organization, but the order that arises from speech is not the same as what linguists study under the rubric of linguistic structure. In both texts and regional/social groups, the frequency distribution of features occurs as the same pattern: an asymptotic hyperbolic curve (or “A-curve”). Formal linguistic systems, grammars, are thus not the direct result of the complex system, and historical linguists must use complexity to mediate between the language production observed in the community and the grammars we describe. The history of the English language does not proceed as regularly as like clockwork, and an understanding of complex systems helps us to see why and how, and suggests what we can do about it. First, the scaling property of complex systems tells us that there are no representative speakers, and so our observation of any small group of speakers is unlikely to represent any group at a larger scale—and limited evidence is the necessary condition of many of our historical studies. The fact that underlying complex distributions follow the 80/20 rule, i.e. 80% of the word tokens in a data set will be instances of only 20% of the word types, while the other 80% of the word types will amount to only 20% of the tokens, gives us an effective tool for estimating the status of historical states of the language. Such a frequency-based technique is opposed to the typological “fit” technique that relies on a few texts that can be reliably located in space, and which may not account for the crosscutting effects of text type, another dimension in which the 80/20 rule applies. Besides issues of sampling, the frequency-based approach also affects how we can think about change. The A-curve immediately translates to the S-curve now used to describe linguistic change, and explains that “change” cannot reasonably be considered to be a qualitative shift. Instead, we can use to model of “punctuated equilibrium” from evolutionary biology (e.g., see Gould and Eldredge 1993), which suggests that multiple changes occur simultaneously and compete rather than the older idea of “phyletic gradualism” in evolution that corresponds to the traditional method of historical linguistics. The Great Vowel Shift, for example, is a useful overall generalization, but complex systems and punctuated equilibrium explain why we should not expect it ever to be “complete” or to appear in the same form in different places. These applications of complexity can help us to understand and interpret our existing studies better, and suggest how new studies in the history of the English language can be made more valid and reliable

    Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality

    Get PDF
    We survey diverse approaches to the notion of information: from Shannon entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov complexity are presented: randomness and classification. The survey is divided in two parts published in a same volume. Part II is dedicated to the relation between logic and information system, within the scope of Kolmogorov algorithmic information theory. We present a recent application of Kolmogorov complexity: classification using compression, an idea with provocative implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses how Kolmogorov complexity, besides being a foundation to randomness, is also related to classification. Another approach to classification is also considered: the so-called "Google classification". It uses another original and attractive idea which is connected to the classification using compression and to Kolmogorov complexity from a conceptual point of view. We present and unify these different approaches to classification in terms of Bottom-Up versus Top-Down operational modes, of which we point the fundamental principles and the underlying duality. We look at the way these two dual modes are used in different approaches to information system, particularly the relational model for database introduced by Codd in the 70's. This allows to point out diverse forms of a fundamental duality. These operational modes are also reinterpreted in the context of the comprehension schema of axiomatic set theory ZF. This leads us to develop how Kolmogorov's complexity is linked to intensionality, abstraction, classification and information system.Comment: 43 page

    Exploring the N-th Dimension of Language

    Get PDF
    This paper is aimed at exploring the hidden fundamental\ud computational property of natural language that has been so elusive that it has made all attempts to characterize its real computational property ultimately fail. Earlier natural language was thought to be context-free. However, it was gradually realized that this does not hold much water given that a range of natural language phenomena have been found as being of non-context-free character that they have almost scuttled plans to brand natural language contextfree. So it has been suggested that natural language is mildly context-sensitive and to some extent context-free. In all, it seems that the issue over the exact computational property has not yet been solved. Against this background it will be proposed that this exact computational property of natural language is perhaps the N-th dimension of language, if what we mean by dimension is\ud nothing but universal (computational) property of natural language

    Phrase structure grammars as indicative of uniquely human thoughts

    Get PDF
    I argue that the ability to compute phrase structure grammars is indicative of a particular kind of thought. This type of thought that is only available to cognitive systems that have access to the computations that allow the generation and interpretation of the structural descriptions of phrase structure grammars. The study of phrase structure grammars, and formal language theory in general, is thus indispensable to studies of human cognition, for it makes explicit both the unique type of human thought and the underlying mechanisms in virtue of which this thought is made possible
    • …
    corecore