2 research outputs found

    Sound Object Recognition

    Get PDF
    Humans are constantly exposed to a variety of acoustic stimuli ranging from music and speech to more complex acoustic scenes like a noisy marketplace. The human auditory perception mechanism is able to analyze these different kinds of sounds and extract meaningful information suggesting that the same processing mechanism is capable of representing different sound classes. In this thesis, we test this hypothesis by proposing a high dimensional sound object representation framework, that captures the various modulations of sound by performing a multi-resolution mapping. We then show that this model is able to capture a wide variety of sound classes (speech, music, soundscapes) by applying it to the tasks of speech recognition, speaker verification, musical instrument recognition and acoustic soundscape recognition. We propose a multi-resolution analysis approach that captures the detailed variations in the spectral characterists as a basis for recognizing sound objects. We then show how such a system can be fine tuned to capture both the message information (speech content) and the messenger information (speaker identity). This system is shown to outperform state-of-art system for noise robustness at both automatic speech recognition and speaker verification tasks. The proposed analysis scheme with the included ability to analyze temporal modulations was used to capture musical sound objects. We showed that using a model of cortical processing, we were able to accurately replicate the human perceptual similarity judgments and also were able to get a good classification performance on a large set of musical instruments. We also show that neither just the spectral feature or the marginals of the proposed model are sufficient to capture human perception. Moreover, we were able to extend this model to continuous musical recordings by proposing a new method to extract notes from the recordings. Complex acoustic scenes like a sports stadium have multiple sources producing sounds at the same time. We show that the proposed representation scheme can not only capture these complex acoustic scenes, but provides a flexible mechanism to adapt to target sources of interest. The human auditory perception system is known to be a complex system where there are both bottom-up analysis pathways and top-down feedback mechanisms. The top-down feedback enhances the output of the bottom-up system to better realize the target sounds. In this thesis we propose an implementation of top-down attention module which is complimentary to the high dimensional acoustic feature extraction mechanism. This attention module is a distributed system operating at multiple stages of representation, effectively acting as a retuning mechanism, that adapts the same system to different tasks. We showed that such an adaptation mechanism is able to tremendously improve the performance of the system at detecting the target source in the presence of various distracting background sources

    A primate model of human cortical analysis of auditory objects

    Get PDF
    PhD ThesisThe anatomical organization of the auditory cortex in old world monkeys is similar to that in humans. But how good are monkeys as a model of human cortical analysis of auditory objects? To address this question I explore two aspects of auditory objectprocessing: segregation and timbre. Auditory segregation concerns the ability of animals to extract an auditory object of relevance from a background of competing sounds. Timbre is an aspect of object identity distinct from pitch. In this work, I study these phenomena in rhesus macaques using behaviour and functional magnetic resonance imaging (fMRI). I specifically manipulate one dimension of timbre, spectral flux: the rate of change of spectral energy.I present this thesis in five chapters. Chapter 1 presents background on auditory processing, macaque auditory cortex, models of auditory segregation, and dimensions of timbre. Chapter 2 presents an introduction to fMRI, the design of the fMRI experiments and analysis of fMRI data, and macaque behavioural training techniques employed. Chapter 3 presents results from the fMRI and behavioural experiments on macaques using a stochastic figure-ground stimulus. Chapter 4 presents the results from the fMRI experiment in macaques using spectral flux stimulus. Chapter 5 concludes with a general discussion of the results from both the studies and some future directions for research.In summary, I show that there is a functional homology between macaques and humans in the cortical processing of auditory figure-ground segregation. However, there is no clear functional homology in the processing of spectral flux between these species. So I conclude that, despite clear similarities in the organization of the auditory cortex and processing of auditory object segregation, there are important differences in how complex cues associated with auditory object identity are processed in the macaque and human auditory brains.Wellcome Trust U
    corecore