2,042 research outputs found

    Ontology of music performance variation

    Get PDF
    Performance variation in rhythm determines the extent that humans perceive and feel the effect of rhythmic pulsation and music in general. In many cases, these rhythmic variations can be linked to percussive performance. Such percussive performance variations are often absent in current percussive rhythmic models. The purpose of this thesis is to present an interactive computer model, called the PD-103, that simulates the micro-variations in human percussive performance. This thesis makes three main contributions to existing knowledge: firstly, by formalising a new method for modelling percussive performance; secondly, by developing a new compositional software tool called the PD-103 that models human percussive performance, and finally, by creating a portfolio of different musical styles to demonstrate the capabilities of the software. A large database of recorded samples are classified into zones based upon the vibrational characteristics of the instruments, to model timbral variation in human percussive performance. The degree of timbral variation is governed by principles of biomechanics and human percussive performance. A fuzzy logic algorithm is applied to analyse current and first-order sample selection in order to formulate an ontological description of music performance variation. Asynchrony values were extracted from recorded performances of three different performance skill levels to create \timing fingerprints" which characterise unique features to each percussionist. The PD-103 uses real performance timing data to determine asynchrony values for each synthesised note. The spectral content of the sample database forms a three-dimensional loudness/timbre space, intersecting instrumental behaviour with music composition. The reparameterisation of the sample database, following the analysis of loudness, spectral flatness, and spectral centroid, provides an opportunity to explore the timbral variations inherent in percussion instruments, to creatively explore dimensions of timbre. The PD-103 was used to create a music portfolio exploring different rhythmic possibilities with a focus on meso-periodic rhythms common to parts of West Africa, jazz drumming, and electroacoustic music. The portfolio also includes new timbral percussive works based on spectral features and demonstrates the central aim of this thesis, which is the creation of a new compositional software tool that integrates human percussive performance and subsequently extends this model to different genres of music

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis

    Vocal imitation for query by vocalisation

    Get PDF
    PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although this varied considerably between drum categories. The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non– verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds

    Listeners are sensitive to the speech breathing time series: Evidence from a gap detection task

    Get PDF
    The effect of non-speech sounds, such as breathing noise, on the perception of speech timing is currently unclear. In this paper we report the results of three studies investigating participants' ability to detect a silent gap located adjacent to breath sounds during naturalistic speech. Experiment 1 (n = 24, in-person) asked whether participants could either detect or locate a silent gap that was added adjacent to breath sounds during speech. In Experiment 2 (n = 182; online), we investigated whether different placements within an utterance were more likely to elicit successful detection of gaps. In Experiment 3 (n = 102; online), we manipulated the breath sounds themselves to examine the effect of breath-specific characteristics on gap identification. Across the study, we document consistent effects of gap duration, as well as gap placement. Moreover, in Experiment 2, whether a gap was positioned before or after an interjected breath significantly predicted accuracy as well as the duration threshold at which gaps were detected, suggesting that nonverbal aspects of audible speech production specifically shape listeners' temporal expectations. We also describe the influences of the breath sounds themselves, as well as the surrounding speech context, that can disrupt objective gap detection performance. We conclude by contextualising our findings within the literature, arguing that the verbal acoustic signal is not "speech itself" per se, but rather one part of an integrated percept that includes speech-related respiration, which could be more fully explored in speech perception studies

    Musical Echolalia and Non-Verbal Children with Autism

    Get PDF
    Typical imitation skills that are integral to language and social learning do not readily develop in children with autism. Echolalia, an echoing or imitation of speech sounds, has historically been considered a non-meaningful form verbal imitation. Since music is intrinsically more meaningful than language for children with autism, musical echolalia may offer path to communication for non-verbal children with autism. This research study sought to identify a potential existence of musical echolalia among nonverbal children with autism. Twelve non-verbal children diagnosed with classic autism, six boys and six girls, aged four to eight, who had no formal musical training or music therapy experience participated in this study. Participants took part in a single, one-onone, videotaped music therapy session. For this study the term musical echolalia was defined as the demonstration of the immediate, relative, imitation of a pitch, melody or, rhythm sequence of a musical phrase performed through vocal, instrumental, or physical expression. Non-musical utterances or noises such as echoic or imitative speech sounds and unrelated motor movements were not included in this study as musical echolalia. Each child’s immediate imitation of discrete musical elements was deemed musical echolalia; thus, elements of pitch, rhythm, voice, musical instrument, and physical expression were included in this study. Based on these criteria seven different sub-types of musical echolalia were identified. Inferential statistics and single factor ANOVA were used to compare the frequency of musical stimuli and musical echolalia, the social responses that occurred after musical echolalia, and the potential associations across gender and age. A statistically significant difference was found for the musical echolalia type RIO (rhythm with a musical instrument) when compared to the frequencies for the 11other sub-types of musical echolalia. The identification of musical echolalia sub-types may offer insight to understanding the musical elements that each child attunes to. Furthermore, the identification of musical echolalia abilities may aid in diagnostic assessment and in the development of treatment protocols for these children. Additional research needs to be done, however, to further determine musical echolalia’s potential for use as a tool in developing social and communicative reciprocity for non-verbal children with autism

    Temporal adaptation and anticipation mechanisms in sensorimotor synchronization

    No full text

    Mapping Acoustic and Semantic Dimensions of Auditory Perception

    Get PDF
    Auditory categorisation is a function of sensory perception which allows humans to generalise across many different sounds present in the environment and classify them into behaviourally relevant categories. These categories cover not only the variance of acoustic properties of the signal but also a wide variety of sound sources. However, it is unclear to what extent the acoustic structure of sound is associated with, and conveys, different facets of semantic category information. Whether people use such data and what drives their decisions when both acoustic and semantic information about the sound is available, also remains unknown. To answer these questions, we used the existing methods broadly practised in linguistics, acoustics and cognitive science, and bridged these domains by delineating their shared space. Firstly, we took a model-free exploratory approach to examine the underlying structure and inherent patterns in our dataset. To this end, we ran principal components, clustering and multidimensional scaling analyses. At the same time, we drew sound labels’ semantic space topography based on corpus-based word embeddings vectors. We then built an LDA model predicting class membership and compared the model-free approach and model predictions with the actual taxonomy. Finally, by conducting a series of web-based behavioural experiments, we investigated whether acoustic and semantic topographies relate to perceptual judgements. This analysis pipeline showed that natural sound categories could be successfully predicted based on the acoustic information alone and that perception of natural sound categories has some acoustic grounding. Results from our studies help to recognise the role of physical sound characteristics and their meaning in the process of sound perception and give an invaluable insight into the mechanisms governing the machine-based and human classifications

    Enactive Sound Machines: Theatrical Strategies for Sonic Interaction Design

    Get PDF
    Embodied interaction with digital sound has been subject to much prior research, but a method of coupling simple and intuitive hand actions to the vast potential of digital soundmaking in a perceptually meaningful way remains elusive. At the same time, artistic practices centred on performative soundmaking with objects remain overlooked by researchers. This thesis explores the design and performance of theatre sound effects in Europe and the U.S. in the late nineteenth and early twentieth century in order to converge the embodied knowledge of soundmaking at the heart of this historical practice with present-day design and evaluation strategies from Sonic Interaction Design and Digital Musical Instrument design. An acoustic theatre wind machine is remade and explored as an interactive sounding object facilitating a continuous sonic interaction with a wind-like sound. Its main soundmaking components are digitally modelled in Max/MSP. A prototype digital wind machine is created by fitting the acoustic wind machine with a rotary encoder to activate the digital wind-like sound in performance. Both wind machines are then evaluated in an experiment with participants. The results show that the timbral qualities of the wind-like sounds are the most important factor in how they are rated for similarity, that the rotational speed of both wind machines is not clearly perceivable from their sounds, and that the enactive properties of the acoustic wind machine have not yet been fully captured in the digital prototype. The wind machine’s flywheel mechanism is also found to be influential in guiding participants in their performances. The findings confirm the acoustic wind machine’s ability to facilitate enactive learning, and a more complete picture of its soundmaking components emerges. The work presented in this thesis opens up the potential of mechanisms to couple simple hand actions to complex soundmaking, whether acoustic or digital, in an intuitive way

    Vocal imitation for query by vocalisation

    Get PDF
    PhDThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the fi rst experiment, musicians were tasked with imitating synthesised sounds with one or two time{varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identi ed with above chance accuracy from the imitations, although this varied considerably between drum categories. The fi ndings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non- verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto-encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same-category drum sounds.Engineering and Physical Sciences Research Council (EP/G03723X/1)
    • …
    corecore