2,042 research outputs found
Ontology of music performance variation
Performance variation in rhythm determines the extent that humans perceive and feel the effect of rhythmic pulsation and music in general. In many cases, these rhythmic variations can be linked to percussive performance. Such percussive performance variations are often absent in current percussive rhythmic models. The purpose of this thesis is to present an interactive computer model, called the PD-103, that simulates the micro-variations in human percussive performance. This thesis makes three main contributions to existing knowledge: firstly, by formalising a new method for modelling percussive performance; secondly, by developing a new compositional software tool called the PD-103 that models human percussive performance, and finally, by creating a portfolio of different musical styles to demonstrate the capabilities of the software. A large database of recorded samples are classified into zones based upon the vibrational characteristics of the instruments, to model timbral variation in human percussive performance. The degree of timbral variation is governed by principles of biomechanics and human percussive performance. A fuzzy logic algorithm is applied to analyse current and first-order sample selection in order to formulate an ontological description of music performance variation. Asynchrony values were extracted from recorded performances of three different performance skill levels to create \timing fingerprints" which characterise unique features to each percussionist. The PD-103 uses real performance timing data to determine asynchrony values for each synthesised note. The spectral content of the sample database forms a three-dimensional loudness/timbre space, intersecting instrumental behaviour with music composition. The reparameterisation of the sample database, following the analysis of loudness, spectral flatness, and spectral centroid, provides an opportunity to explore the timbral variations inherent in percussion instruments, to creatively explore dimensions of timbre. The PD-103 was used to create a music portfolio exploring different rhythmic possibilities with a focus on meso-periodic rhythms common to parts of West Africa, jazz drumming, and electroacoustic music. The portfolio also includes new timbral percussive works based on spectral features and demonstrates the central aim of this thesis, which is the creation of a new compositional software tool that integrates human percussive performance and subsequently extends this model to different genres of music
Making music through real-time voice timbre analysis: machine learning and timbral control
PhDPeople can achieve rich musical expression through vocal sound { see for example
human beatboxing, which achieves a wide timbral variety through a range of
extended techniques. Yet the vocal modality is under-exploited as a controller
for music systems. If we can analyse a vocal performance suitably in real time,
then this information could be used to create voice-based interfaces with the
potential for intuitive and ful lling levels of expressive control.
Conversely, many modern techniques for music synthesis do not imply any
particular interface. Should a given parameter be controlled via a MIDI keyboard,
or a slider/fader, or a rotary dial? Automatic vocal analysis could provide
a fruitful basis for expressive interfaces to such electronic musical instruments.
The principal questions in applying vocal-based control are how to extract
musically meaningful information from the voice signal in real time, and how
to convert that information suitably into control data. In this thesis we address
these questions, with a focus on timbral control, and in particular we
develop approaches that can be used with a wide variety of musical instruments
by applying machine learning techniques to automatically derive the mappings
between expressive audio input and control output. The vocal audio signal is
construed to include a broad range of expression, in particular encompassing
the extended techniques used in human beatboxing.
The central contribution of this work is the application of supervised and
unsupervised machine learning techniques to automatically map vocal timbre
to synthesiser timbre and controls. Component contributions include a delayed
decision-making strategy for low-latency sound classi cation, a regression-tree
method to learn associations between regions of two unlabelled datasets, a fast
estimator of multidimensional di erential entropy and a qualitative method for
evaluating musical interfaces based on discourse analysis
Vocal imitation for query by vocalisation
PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the first experiment, musicians were tasked with imitating synthesised sounds with one or two timeâvarying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of nonâ
verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned using convolutional autoâencoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and sameâcategory drum sounds
Recommended from our members
Feeling the groove: shared time and its meanings for three jazz trios
The notion of groove is fundamental to jazz culture and the term yields a rich set of understandings for jazz musicians. Within the literature, no single perspective on groove exists and many questions remain about the relationship between timing processes, phenomenal experience and musical structures in making sense of groove.
In this account, the experience and meaning of groove is theorised as emerging from two forms of sharedness. Firstly, a primary intersubjectivity that arises through the timing behaviours of the players; this could be likened to the 'mutual tuning-in' described in social phenomenology. It is proposed that this tuning-in is accomplished through the mechanism of entrainment. The second form of sharedness is understood as the shared temporal models, the cultural knowledge, that musicians make use of in their playing together.
Methodologically, this study makes use of detailed investigation of timing data from live performances by three jazz trios, framed by in-depth, semi-structured interview material and steers a new course between existing ethnographic work on jazz and more psychologically informed studies of timing.
The findings of the study point towards significant social and structural effects on the groove between players. The impact of musical role on groove and timing is demonstrated and significant temporal models, whose syntactic relations suggest musical proximity or distance, are shown to have a corresponding effect on timing within the trios. The musician's experience of groove is discussed as it relates to the objective timing data and reveals a complex set of understandings involving temporality, consciousness and communication.
In the light of these findings, groove is summarised as the feeling of entrainment, inflected through cultural models and expressed through the cultural norms of jazz
Listeners are sensitive to the speech breathing time series: Evidence from a gap detection task
The effect of non-speech sounds, such as breathing noise, on the perception of speech timing is currently unclear. In this paper we report the results of three studies investigating participants' ability to detect a silent gap located adjacent to breath sounds during naturalistic speech. Experiment 1 (n = 24, in-person) asked whether participants could either detect or locate a silent gap that was added adjacent to breath sounds during speech. In Experiment 2 (n = 182; online), we investigated whether different placements within an utterance were more likely to elicit successful detection of gaps. In Experiment 3 (n = 102; online), we manipulated the breath sounds themselves to examine the effect of breath-specific characteristics on gap identification. Across the study, we document consistent effects of gap duration, as well as gap placement. Moreover, in Experiment 2, whether a gap was positioned before or after an interjected breath significantly predicted accuracy as well as the duration threshold at which gaps were detected, suggesting that nonverbal aspects of audible speech production specifically shape listeners' temporal expectations. We also describe the influences of the breath sounds themselves, as well as the surrounding speech context, that can disrupt objective gap detection performance. We conclude by contextualising our findings within the literature, arguing that the verbal acoustic signal is not "speech itself" per se, but rather one part of an integrated percept that includes speech-related respiration, which could be more fully explored in speech perception studies
Musical Echolalia and Non-Verbal Children with Autism
Typical imitation skills that are integral to language and social learning do not readily develop in children with autism. Echolalia, an echoing or imitation of speech sounds, has historically been considered a non-meaningful form verbal imitation. Since music is intrinsically more meaningful than language for children with autism, musical echolalia may offer path to communication for non-verbal children with autism. This research study sought to identify a potential existence of musical echolalia among nonverbal children with autism. Twelve non-verbal children diagnosed with classic autism, six boys and six girls, aged four to eight, who had no formal musical training or music therapy experience participated in this study. Participants took part in a single, one-onone, videotaped music therapy session. For this study the term musical echolalia was defined as the demonstration of the immediate, relative, imitation of a pitch, melody or, rhythm sequence of a musical phrase performed through vocal, instrumental, or physical expression. Non-musical utterances or noises such as echoic or imitative speech sounds and unrelated motor movements were not included in this study as musical echolalia. Each childâs immediate imitation of discrete musical elements was deemed musical echolalia; thus, elements of pitch, rhythm, voice, musical instrument, and physical expression were included in this study. Based on these criteria seven different sub-types of musical echolalia were identified. Inferential statistics and single factor ANOVA were used to compare the frequency of musical stimuli and musical echolalia, the social responses that occurred after musical echolalia, and the potential associations across gender and age. A statistically significant difference was found for the musical echolalia type RIO (rhythm with a musical instrument) when compared to the frequencies for the 11other sub-types of musical echolalia. The identification of musical echolalia sub-types may offer insight to understanding the musical elements that each child attunes to. Furthermore, the identification of musical echolalia abilities may aid in diagnostic assessment and in the development of treatment protocols for these children. Additional research needs to be done, however, to further determine musical echolaliaâs potential for use as a tool in developing social and communicative reciprocity for non-verbal children with autism
Mapping Acoustic and Semantic Dimensions of Auditory Perception
Auditory categorisation is a function of sensory perception which allows humans to generalise across many different sounds present in the environment and classify them into behaviourally relevant categories. These categories cover not only the variance of acoustic properties of the signal but also a wide variety of sound sources. However, it is unclear to what extent the acoustic structure of sound is associated with, and conveys, different facets of semantic category information. Whether people use such data and what drives their decisions when both acoustic and semantic information about the sound is available, also remains unknown. To answer these questions, we used the existing methods broadly practised in linguistics, acoustics and cognitive science, and bridged these domains by delineating their shared space. Firstly, we took a model-free exploratory approach to examine the underlying structure and inherent patterns in our dataset. To this end, we ran principal components, clustering and multidimensional scaling analyses. At the same time, we drew sound labelsâ semantic space topography based on corpus-based word embeddings vectors. We then built an LDA model predicting class membership and compared the model-free approach and model predictions with the actual taxonomy. Finally, by conducting a series of web-based behavioural experiments, we investigated whether acoustic and semantic topographies relate to perceptual judgements. This analysis pipeline showed that natural sound categories could be successfully predicted based on the acoustic information alone and that perception of natural sound categories has some acoustic grounding. Results from our studies help to recognise the role of physical sound characteristics and their meaning in the process of sound perception and give an invaluable insight into the mechanisms governing the machine-based and human classifications
Enactive Sound Machines: Theatrical Strategies for Sonic Interaction Design
Embodied interaction with digital sound has been subject to much prior research, but a method of coupling simple and intuitive hand actions to the vast potential of digital soundmaking in a perceptually meaningful way remains elusive. At the same time, artistic practices centred on performative soundmaking with objects remain overlooked by researchers. This thesis explores the design and performance of theatre sound effects in Europe and the U.S. in the late nineteenth and early twentieth century in order to converge the embodied knowledge of soundmaking at the heart of this historical practice with present-day design and evaluation strategies from Sonic Interaction Design and Digital Musical Instrument design.
An acoustic theatre wind machine is remade and explored as an interactive sounding object facilitating a continuous sonic interaction with a wind-like sound. Its main soundmaking components are digitally modelled in Max/MSP. A prototype digital wind machine is created by fitting the acoustic wind machine with a rotary encoder to activate the digital wind-like sound in performance. Both wind machines are then evaluated in an experiment with participants. The results show that the timbral qualities of the wind-like sounds are the most important factor in how they are rated for similarity, that the rotational speed of both wind machines is not clearly perceivable from their sounds, and that the enactive properties of the acoustic wind machine have not yet been fully captured in the digital prototype. The wind machineâs flywheel mechanism is also found to be influential in guiding participants in their performances. The findings confirm the acoustic wind machineâs ability to facilitate enactive learning, and a more complete picture of its soundmaking components emerges. The work presented in this thesis opens up the potential of mechanisms to couple simple hand actions to complex soundmaking, whether acoustic or digital, in an intuitive way
Vocal imitation for query by vocalisation
PhDThe human voice presents a rich and powerful medium for expressing sonic
ideas such as musical sounds. This capability extends beyond the sounds used
in speech, evidenced for example in the art form of beatboxing, and recent
studies highlighting the utility of vocal imitation for communicating sonic concepts.
Meanwhile, the advance of digital audio has resulted in huge libraries of
sounds at the disposal of music producers and sound designers. This presents
a compelling search problem: with larger search spaces, the task of navigating
sound libraries has become increasingly difficult. The versatility and expressive
nature of the voice provides a seemingly ideal medium for querying sound
libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this
thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different
audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the fi rst experiment, musicians were tasked with imitating synthesised
sounds with one or two time{varying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid
features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features.
This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously.
The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to
rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest
similarity ratings when rated against their imitations (as opposed to imitations
of another sound), and overall more than half the imitated sounds were
correctly identi ed with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The fi ndings from the vocal imitation experiments highlight the capacity
of musicians to vocally imitate musical sounds, and some limitations of non-
verbal vocal expression. Finally, we investigated the performance of different
audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned
using convolutional auto-encoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more
important than spectral resolution for differentiating between the vocal imitations
and same-category drum sounds.Engineering and Physical Sciences Research Council (EP/G03723X/1)
- âŚ