2 research outputs found

    Speech tested for Zipfian fit using rigorous statistical techniques

    Get PDF
    Zipf’s law describes the relationship between the frequencies of words in a corpus and their rank. Its most basic form is a simple series, indicating that the frequency of a word is inverselyproportional to its rank:1/2, 1/3, 1/4,...The past two decades have seen the emergence of usage-based and cognitive approaches to language study. A key observation of these approaches, along with the importance of frequency, is that speech differs in substantial and structural ways from writing. Yet, except for a few older analyses performed on very small corpora, most studies of Zipf’s law have been done on written corpora. Further, a judgement of Zifianness in much of this work is based on loose and informal criteria.  In fact, sophisticated statistical techniques have been developed for curve fitting in recent years in the mathematics and physics literature. These include the use of the Kolmogorov-Smirnov statistic, along with maximum likelihood estimation to generate p-values and the use of the complementary error function for normal distributions. The latter helps determine if a corpus, failing a Zipfian fit, might be better described by another distribution. In this paper, we will:Show that three corpora of recorded speech follow a power law distribution using rigorous statis- tical techniques: Buckeye, Santa Barbara, MiCaseDescribe preliminary results showing that the techniques outlined in this paper may be useful in the diagnoses of those conditions that can include disordered speech.Explain how to do the analyses described in this paper.Explain how to download and use the R/Python code we have written and packaged as the Zipf Tool Ki
    corecore