1,670 research outputs found

    Data-driven Extraction of Intonation Contour Classes

    Get PDF
    In this paper we introduce the first steps towards a new datadriven method for extraction of intonation events that does not require any prerequisite prosodic labelling. Provided with data segmented on the syllable constituent level it derives local and global contour classes by stylisation and subsequent clustering of the stylisation parameter vectors. Local contour classes correspond to pitch movements connected to one or several syllables and determine the local f0 shape. Global classes are connected to intonation phrases and determine the f0 register. Local classes initially are derived for syllabic segments, which are then concatenated incrementally by means of statistical language modelling of co-occurrence patterns. Due to its generality the method is in principal language independent and potentially capable to deal also with other aspects of prosody than intonation. 1

    Automatisation of intonation modelling and its linguistic anchoring

    Get PDF
    This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings

    Self-Supervised Representation Learning for Vocal Music Context

    Full text link
    In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this paper we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency (F0F_0) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen vocal contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15 % as compared to using traditional statistical contour features.Comment: Working on more updated versio

    New Perspectives in Teaching Pronunciation

    Get PDF
    pp.165-18

    Big words, small phrases: Mismatches between pause units and the polysynthetic word in Dalabon

    Get PDF
    This article uses instrumental data from natural speech to examine the phenomenon of pause placement within the verbal word in Dalabon, a polysynthetic Australian language of Arnhem Land. Though the phenomenon is incipient and in two sample texts occurs in only around 4% of verbs, there are clear possibilities for interrupting the grammatical word by pause after the pronominal prefix and some associated material at the left edge, though these within-word pauses are significantly shorter, on average, than those between words. Within-word pause placement is not random, but is restricted to certain affix boundaries; it requires that the paused-after material be at least dimoraic, and that the remaining material in the verbal word be at least disyllabic. Bininj Gun-wok, another polysynthetic language closely related to Dalabon, does not allow pauses to interrupt the verbal word, and the Dalabon development appears to be tied up with certain morphological innovations that have increased the proportion of closed syllables in the pronominal prefix zone of the verb. Though only incipient and not yet phonologized, pause placement in Dalabon verbs suggests a phonology-driven route by which polysynthetic languages may ultimately become less morphologically complex by fracturing into smaller units
    corecore