6 research outputs found
Spatial auditory display for acoustics and music collections
PhDThis thesis explores how audio can be better incorporated into how people access
information and does so by developing approaches for creating three-dimensional audio
environments with low processing demands. This is done by investigating three research
questions.
Mobile applications have processor and memory requirements that restrict the
number of concurrent static or moving sound sources that can be rendered with binaural
audio. Is there a more e cient approach that is as perceptually accurate as the traditional
method? This thesis concludes that virtual Ambisonics is an ef cient and accurate means
to render a binaural auditory display consisting of noise signals placed on the horizontal
plane without head tracking. Virtual Ambisonics is then more e cient than convolution
of HRTFs if more than two sound sources are concurrently rendered or if movement of
the sources or head tracking is implemented.
Complex acoustics models require signi cant amounts of memory and processing. If
the memory and processor loads for a model are too large for a particular device, that
model cannot be interactive in real-time. What steps can be taken to allow a complex
room model to be interactive by using less memory and decreasing the computational
load? This thesis presents a new reverberation model based on hybrid reverberation
which uses a collection of B-format IRs. A new metric for determining the mixing
time of a room is developed and interpolation between early re
ections is investigated.
Though hybrid reverberation typically uses a recursive lter such as a FDN for the late
reverberation, an average late reverberation tail is instead synthesised for convolution
reverberation.
Commercial interfaces for music search and discovery use little aural information
even though the information being sought is audio. How can audio be used in
interfaces for music search and discovery? This thesis looks at 20 interfaces and
determines that several themes emerge from past interfaces. These include using a two
or three-dimensional space to explore a music collection, allowing concurrent playback of
multiple sources, and tools such as auras to control how much information is presented. A
new interface, the amblr, is developed because virtual two-dimensional spaces populated
by music have been a common approach, but not yet a perfected one. The amblr is also
interpreted as an art installation which was visited by approximately 1000 people over 5
days. The installation maps the virtual space created by the amblr to a physical space
Vocal imitation for query by vocalisation
PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non–
verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds
Vocal imitation for query by vocalisation
PhDThe human voice presents a rich and powerful medium for expressing sonic
ideas such as musical sounds. This capability extends beyond the sounds used
in speech, evidenced for example in the art form of beatboxing, and recent
studies highlighting the utility of vocal imitation for communicating sonic concepts.
Meanwhile, the advance of digital audio has resulted in huge libraries of
sounds at the disposal of music producers and sound designers. This presents
a compelling search problem: with larger search spaces, the task of navigating
sound libraries has become increasingly difficult. The versatility and expressive
nature of the voice provides a seemingly ideal medium for querying sound
libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this
thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different
audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the fi rst experiment, musicians were tasked with imitating synthesised
sounds with one or two time{varying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid
features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features.
This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously.
The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to
rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest
similarity ratings when rated against their imitations (as opposed to imitations
of another sound), and overall more than half the imitated sounds were
correctly identi ed with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The fi ndings from the vocal imitation experiments highlight the capacity
of musicians to vocally imitate musical sounds, and some limitations of non-
verbal vocal expression. Finally, we investigated the performance of different
audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned
using convolutional auto-encoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more
important than spectral resolution for differentiating between the vocal imitations
and same-category drum sounds.Engineering and Physical Sciences Research Council (EP/G03723X/1)
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010