6 research outputs found

    Spatial auditory display for acoustics and music collections

    Get PDF
    PhDThis thesis explores how audio can be better incorporated into how people access information and does so by developing approaches for creating three-dimensional audio environments with low processing demands. This is done by investigating three research questions. Mobile applications have processor and memory requirements that restrict the number of concurrent static or moving sound sources that can be rendered with binaural audio. Is there a more e cient approach that is as perceptually accurate as the traditional method? This thesis concludes that virtual Ambisonics is an ef cient and accurate means to render a binaural auditory display consisting of noise signals placed on the horizontal plane without head tracking. Virtual Ambisonics is then more e cient than convolution of HRTFs if more than two sound sources are concurrently rendered or if movement of the sources or head tracking is implemented. Complex acoustics models require signi cant amounts of memory and processing. If the memory and processor loads for a model are too large for a particular device, that model cannot be interactive in real-time. What steps can be taken to allow a complex room model to be interactive by using less memory and decreasing the computational load? This thesis presents a new reverberation model based on hybrid reverberation which uses a collection of B-format IRs. A new metric for determining the mixing time of a room is developed and interpolation between early re ections is investigated. Though hybrid reverberation typically uses a recursive lter such as a FDN for the late reverberation, an average late reverberation tail is instead synthesised for convolution reverberation. Commercial interfaces for music search and discovery use little aural information even though the information being sought is audio. How can audio be used in interfaces for music search and discovery? This thesis looks at 20 interfaces and determines that several themes emerge from past interfaces. These include using a two or three-dimensional space to explore a music collection, allowing concurrent playback of multiple sources, and tools such as auras to control how much information is presented. A new interface, the amblr, is developed because virtual two-dimensional spaces populated by music have been a common approach, but not yet a perfected one. The amblr is also interpreted as an art installation which was visited by approximately 1000 people over 5 days. The installation maps the virtual space created by the amblr to a physical space

    Vocal imitation for query by vocalisation

    Get PDF
    PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although this varied considerably between drum categories. The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non– verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds

    Vocal imitation for query by vocalisation

    Get PDF
    PhDThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the fi rst experiment, musicians were tasked with imitating synthesised sounds with one or two time{varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identi ed with above chance accuracy from the imitations, although this varied considerably between drum categories. The fi ndings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non- verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto-encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same-category drum sounds.Engineering and Physical Sciences Research Council (EP/G03723X/1)

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore