84,947 research outputs found

    Usage Effects on the Cognitive Routinization of Chinese Resultative Verbs

    Get PDF
    The present study adopts a corpus-oriented usage-based approach to the grammar of Chinese resultative verbs. Zooming in on a specific class of V-kai constructions, this paper aims to elucidate the effect of frequency in actual usage events on shaping the linguistic representations of resultative verbs. Specifically, it will be argued that while high token frequency results in more lexicalized V-kai complex verbs, high type frequency gives rise to more schematized V-kai constructions. The routinized patterns pertinent to V-kai resultative verbs varying in their extent of specificity and generality accordingly serve as a representative illustration of the continuum between lexicon and grammar that characterizes a usage-based conception of language

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    Population size predicts lexical diversity, but so does the mean sea level - why it is important to correctly account for the structure of temporal data

    Get PDF
    In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering

    Log-log Convexity of Type-Token Growth in Zipf's Systems

    Full text link
    It is traditionally assumed that Zipf's law implies the power-law growth of the number of different elements with the total number of elements in a system - the so-called Heaps' law. We show that a careful definition of Zipf's law leads to the violation of Heaps' law in random systems, and obtain alternative growth curves. These curves fulfill universal data collapses that only depend on the value of the Zipf's exponent. We observe that real books behave very much in the same way as random systems, despite the presence of burstiness in word occurrence. We advance an explanation for this unexpected correspondence

    Token-based typology and word order entropy: A study based on universal dependencies

    No full text
    The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme

    Free Productive Ability and Lexical Text Analysis to Improve Student Writing

    Get PDF
    The classroom is often an arena of Controlled Productive Ability. Within this system, the teacher issues communiques and makes deposits which the students patiently receive, memorize, and repeat. Further, this ‘banking’ concept of education, extends the scope of action afforded to students only as far as receiving, filing, and storing the deposits. Education is thus seen as a process of depositing knowledge into passive students. Freire (1970) exhorts that ‘…the more completely they (the students) accept the passive role imposed on them, the more they tend simply to adapt to the world as it is and to the fragmented view of reality deposited on them’. This research paper will look at how a class of low-intermediate Japanese learners of English, can become more attuned to Free Productive Ability, the active use of productive vocabulary, in their written English endeavors. Writing itself is a production skill, in that it requires learners to produce language, as with speaking activities. Written English can be used to produce a message that you want others to understand. However, at most stages of the writing process from selecting themes and topics, brainstorming ideas, organizing ideas, drafting a text, reviewing and editing before submission, and finally grading and reflecting, the student is part of a passive process managed by the authority of the teacher. This inhibits student critical thinking and the ownership of their own productive abilities. An alternative is to develop and practice a free productive system, limiting the traditional teacher-centric learning system. At all times, students should be encouraged to think, and tackle problems presented to them on their own. This research builds on previous research of student self-affirmation (Deadman, 2015a, 2015b, 2016a and 2016b)

    Polish children's productivity with case marking: the role of regularity, type frequency, and phonological diversity

    Get PDF
    Polish-speaking children aged from 2;4, to 4;8 and 16 adult controls participated in a nonce-word inflection experiment testing their ability to use the genitive, dative and accusative inflections productively. Results show that this ability develops early: the majority of two-year-olds were already productive with all inflections apart from dative neuter; and the overall performance of the four-year-olds was very similar to that of adults. All age groups were more productive with inflections that apply to large and/or phonologically diverse classes, although class size and token frequency appeared to be more important for younger children (two- and three-year-olds) and phonological diversity for older children and adults. Regularity, on the other hand, was a very poor predictor of productivity. The results support usage-based models of language acquisition and are problematic for the dual mechanism model
    corecore