10 research outputs found
The challenges of statistical patterns of language: the case of Menzerath's law in genomes
The importance of statistical patterns of language has been debated over
decades. Although Zipf's law is perhaps the most popular case, recently,
Menzerath's law has begun to be involved. Menzerath's law manifests in
language, music and genomes as a tendency of the mean size of the parts to
decrease as the number of parts increases in many situations. This statistical
regularity emerges also in the context of genomes, for instance, as a tendency
of species with more chromosomes to have a smaller mean chromosome size. It has
been argued that the instantiation of this law in genomes is not indicative of
any parallel between language and genomes because (a) the law is inevitable and
(b) non-coding DNA dominates genomes. Here mathematical, statistical and
conceptual challenges of these criticisms are discussed. Two major conclusions
are drawn: the law is not inevitable and languages also have a correlate of
non-coding DNA. However, the wide range of manifestations of the law in and
outside genomes suggests that the striking similarities between non-coding DNA
and certain linguistics units could be anecdotal for understanding the
recurrence of that statistical law.Comment: Title changed, abstract and introduction improved and little
corrections on the statistical argument
When is Menzerath-Altmann law mathematically trivial? A new approach
Menzerath’s law, the tendency of Z (the mean size of the parts) to decrease as X (the number of parts) increases, is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z~1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z~1/X in genomes. The most powerful test is a new non-parametric one based upon the correlation ratio, which is able to reject Z~1/X in nine out of 11 taxonomic groups and detect a borderline group. Rather than a fact, Z~1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed.Peer ReviewedPostprint (author’s final draft
Tackling the Toolkit. Plotting Poetry through Computational Literary Studies
In Tackling the Toolkit, we focus on the methodological innovations, challenges, obstacles and even shortcomings associated with applying quantitative methods to poetry specifically and poetics more broadly. Using tools including natural language processing, web ontologies, similarity detection devices and machine learning, our contributors explore not only metres, stanzas, stresses and rhythms but also genres, subgenres, lexical material and cognitive processes. Whether they are testing old theories and laws, making complex concepts machine-readable or developing new lines of textual analysis, their works challenge standard descriptions of norms and variations
Quantifying Interpreting Types: Language Sequence Mirrors Cognitive Load Minimization in Interpreting Tasks
Most interpreting theories claim that different interpreting types should involve varied processing mechanisms and procedures. However, few studies have examined their underlying differences. Even though some previous results based on quantitative approaches show that different interpreting types yield outputs of varying lexical and syntactic features, the grammatical parsing approach is limited. Language sequences that form without relying on parsing or processing with a specific linguistic approach or grammar excel other quantitative approaches at revealing the sequential behavior of language production. As a non-grammatically-bound unit of language sequences, frequency motif can visualize the local distribution of content and function words, and can also statistically classify languages and identify text types. Thus, the current research investigates the distribution, length and position-dependent properties of frequency motifs across different interpreting outputs in pursuit of the sequential generation behaviors. It is found that the distribution, the length and certain position-dependent properties of the specific language sequences differ significantly across simultaneous interpreting and consecutive interpreting output. The features of frequency motifs manifest that both interpreting output is produced in the manner that abides by the least effort principle. The current research suggests that interpreting types can be differentiated through this type of language sequential unit and offers evidence for how the different task features mediate the sequential organization of interpreting output under different demand to achieve cognitive load minimization
Lengths and L-motifs of rhythmical units in formal British speech
The lengths of rhythmical units (as defined by Karl Marbe in 1904) were identified, and their frequencies counted, in twelve complete texts from the Aix-MARSEC database of formal spoken British English. The texts all belonged to the genre of current affairs com¬mentary. L-motifs (i.e. maximal monotone non-decreasing sequences) of the rhythmical unit lengths were also identified, and the frequencies of the different L-motif lengths were count¬ed. The frequencies of both rhythmical unit lengths and L-motif lengths were modelled using a continuous approach with the Zipf-Alekseev function. Good qualities of fit were obtained for both kinds of unit on all texts. The parameters a and b of the Zipf-Alekseev function for the rhythmical unit lengths (though not for the L-motif lengths) were also found to be related in the form of a further Zipf-Alekseev function. Further research should aim to extend the application of the motif approach to rhythmical units
Melodic segmentation: structure, cognition, algorithms
Segmentation of melodies into smaller units (phrases, themes, motifs, etc.) is an important process in both music analysis and music cognition. Also, segmentation is a necessary preprocessing step for various tasks in music information retrieval. Several algorithms for automatic segmentation have been proposed, based on different music-theoretical backgrounds and computing approaches. Rule-based models operate on a given set of logical conditions. Learning-based models, originating in linguistics, compute segmentation criteria on the basis of statistical parameters of a training corpus and/or of the given composition. The aim of this preliminary study is to propose and describe a new segmentation algorithm that is rule-based, parsimonious, and unambiguous
On Musical Self-Similarity : Intersemiosis as Synecdoche and Analogy
Self-similarity, a concept borrowed from mathematics, is gradually becoming a keyword in musicology. Although a polysemic term, self-similarity often refers to the multi-scalar feature repetition in a set of relationships, and it is commonly valued as an indication for musical ‘coherence’ and ‘consistency’. In this study, Gabriel Pareyon presents a theory of musical meaning formation in the context of intersemiosis, that is, the translation of meaning from one cognitive domain to another cognitive domain (e.g. from mathematics to music, or to speech or graphic forms). From this perspective, the degree of coherence of a musical system relies on a synecdochic intersemiosis: a system of related signs within other comparable and correlated systems. The author analyzes the modalities of such correlations, exploring their general and particular traits, and their operational bounds. Accordingly, the notion of analogy is used as a rich concept through its two definitions quoted by the Classical literature—proportion and paradigm, enormously valuable in establishing measurement, likeness and affinity criteria. At the same time, original arguments by Benoît B. Mandelbrot (1924–2010) are revised, alongside a systematic critique of the literature on the subject. In fact, connecting Charles S. Peirce’s ‘synechism’ with Mandelbrot’s ‘fractality’ is one of the main developments of the present study