18 research outputs found

    Units and constituency in prosodic analysis:a quantitative assessment

    Get PDF
    Drawing on methods from quantitative linguistics, this paper tests the hypothesis that the intonation unit is a valid language construct whose immediate constituent is the foot (and whose own immediate constituent is the syllable). If the hypothesis is true, then the lengths of intonation units, measured in feet, should abide by a regular and parsimonious discrete probability distribution, and the immediate constituency relationship between feet and intonation units should be further demonstrable by successfully fitting the Menzerath-Altmann equation with a negative exponent. However, out of sixteen texts from the Aix-MARSEC database, only six share a common probability distribution and only eight exhibit a tolerable fit of the Menzerath-Altmann equation. A failure rate of ≥ 50% in both cases casts doubt on the validity of the hypothesis

    On the physical origin of linguistic laws and lognormality in speech

    Get PDF
    Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyze comes from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this ‘lognormality law’ using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf’s law, Herdan’s law, Brevity law, and Menzerath-Altmann’s law) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan’s law in physical units, (ii) a precise mathematical formulation of Brevity law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law, or (ii) a mathematical derivation of Menzerath-Altmann’s law which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin

    On Hilberg's Law and Its Links with Guiraud's Law

    Full text link
    Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text. Such operational definition of words can be applied even to texts deprived of spaces, which do not allow for Mandelbrot's ``intermittent silence'' explanation of Zipf's and Guiraud's laws. In contrast to Mandelbrot's, our model assumes some probabilistic long-memory effects in human narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic

    Modelowanie matematyczne w językoznawstwie. Na przykładzie statystycznych praw językowych

    Get PDF
    Autorka w publikacji zajmuje się problemem zastosowania modelowania matematycznego w językoznawstwie na przykładzie statystycznych praw językowych. Kolejno omawia metody konstruowania prawa, w którym stosuje się modelowanie matematyczne, następnie przechodzi do obszaru badań synergetyki, opisuje język jako system otwarty a kończy na omówieniu prawa Menzeratha-Altmanna jako studium przypadku zastosowania modelowania matematycznego.Udostępnienie publikacji Wydawnictwa Uniwersytetu Łódzkiego finansowane w ramach projektu „Doskonałość naukowa kluczem do doskonałości kształcenia”. Projekt realizowany jest ze środków Europejskiego Funduszu Społecznego w ramach Programu Operacyjnego Wiedza Edukacja Rozwój; nr umowy: POWER.03.05.00-00-Z092/17-00

    Optimal coding and the origins of Zipfian laws

    Full text link
    The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of concordant pair corrected, proofs polished, references update

    Verse diversification: Frequencies and variations of verse types in Vana kannel and Kalevipoeg

    Get PDF
    The present study concentrates on specific linguistic aspects in traditional Estonian poetic texts. Focusing on the verse structure of the traditional folk song of Vana kannel and the individually edited and authored epic poem Kalevipoeg, different aspects of the length of verse lines, of the words included in these verses, and of the relation between verse and word length shall be analyzed, aiming to study verse variability in detail. Given there are specific rules of verse and word length organization, as well as of regular relations between them, sequences of words with different length, resulting in different verse types, are focused. Theoretical and empirical evidence is provided that, in addition to existing regularities, verse variability, too, follows specific rules which can be modelled in terms of a diversificational process

    The Phylogeny and Function of Vocal Complexity in Geladas

    Full text link
    The complexity of vocal communication varies widely across taxa – from humans who can create an infinite repertoire of sound combinations to some non-human species that produce only a few discrete sounds. A growing body of research is aimed at understanding the origins of ‘vocal complexity’. And yet, we still understand little about the evolutionary processes that led to, and the selective advantages of engaging in, complex vocal behaviors. I contribute to this body of research by examining the phylogeny and function of vocal complexity in wild geladas (Theropithecus gelada), a primate known for its capacity to combine a suite of discrete sound types into varied sequences. First, I investigate the phylogeny of vocal complexity by comparing gelada vocal communication with that of their close baboon relatives and with humans. Comparisons of vocal repertoires reveal that geladas – specifically the males – produce a suite of unique or ‘derived’ call types that results in a more diversified vocal repertoire than baboons. Also, comparisons of acoustic properties reveal that geladas produce vocalizations with greater spectro-temporal modulation, a feature shared with human speech, than baboons. Additionally, I show that the same organizational principle – Menzerath’s law – underpins the structure of gelada vocal sequences (i.e., combinations of derived and homologous call types) and human sentences. Second, I investigate the function of vocal complexity by examining the perception of male complex vocal sequences (i.e., those with more derived call types), the contexts in which they are produced, and how their production differs across individuals. A playback experiment shows that female geladas perceive ‘complex’ and ‘simple’ vocal sequences as being different. Then, two observational studies show that male production of complex vocal sequences mediates their affiliative interactions with females, both during neutral periods and periods of uncertainty (e.g., following conflicts). Finally, I find evidence that vocal complexity can act as a signal of male ‘quality’, in that more dominant males exhibit higher levels of vocal complexity than their subordinate counterparts. Collectively, the work presented in this dissertation presents an integrative investigation of the ultimate origins of complex communication systems, and in the process, it highlights the critical importance of approaching the study of complexity from several scientific perspectives.PHDPsychologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138479/1/gustison_1.pd
    corecore