4 research outputs found

    The role of sound offsets in auditory temporal processing and perception

    Get PDF
    Sound-offset responses are distinct to sound onsets in their underlying neural mechanisms, temporal processing pathways and roles in auditory perception following recent neurobiological studies. In this work, I investigate the role of sound offsets and the effect of reduced sensitivity to offsets on auditory perception in humans. The implications of a 'sound-offset deficit' for speech-in-noise perception are investigated, based on a mathematical model with biological significance and independent channels for onset and offset detection. Sound offsets are important in recognising, distinguishing and grouping sounds. They are also likely to play a role in perceiving consonants that lie in the troughs of amplitude fluctuations in speech. The offset influence on the discriminability of model outputs for 48 non-sense vowel-consonant-vowel (VCV) speech stimuli in varying levels of multi-talker babble noise (-12, -6, 0, 6, 12 dB SNR) was assessed, and led to predictions that correspond to known phonetic categories. This work therefore suggests that variability in the offset salience alone can explain the rank order of consonants most affected in noisy situations. A novel psychophysical test battery for offset sensitivity was devised and assessed, followed by a study to find an electrophysiological correlate. The findings suggest that individual differences in sound-offset sensitivity may be a factor contributing to inter-subject variation in speech-in-noise discrimination ability. The promising measures from these results can be used to test between-population differences in offset sensitivity, with more support for objective than psychophysical measures. In the electrophysiological study, offset responses in a duration discrimination paradigm were found to be modulated by attention compared to onset responses. Overall, this thesis shows for the first time that the onset-offset dichotomy in the auditory system, previously explored in physiological studies, is also evident in human studies for both simple and complex speech sounds

    Text-Based Guidance for Improved Image Retrievalon Archival Image Dataset

    Get PDF
    Digitised archival photo collections allow members of the public to view images relating to history and democracy. Recent advancements in visual tasks such as Content Based Image Retrieval and the development of deep neural networks have provided modern methods to analyse digitised images and perform image queries for retrieval. We explore the image retrieval task using several publicly available datasets, and a set of archival images from the National Archives of Australia, and propose a simple change to existing pooling method to improve retrieval performance in the archival set. Another visual task of object localisation considers the ability of a model to be trained to adequately locate in an image the positions of objects, given English text phrases. With other recent advances in large-scale text embedding models, pre-trained text models retain rich semantic structure within them. While other methods of object localisation involve the training of text pathways in their deep neural model, we explore direct use of a large-scale text embedding for this task, and demonstrate its ability to localise objects, and even on unseen words

    Sound source segregation of multiple concurrent talkers via Short-Time Target Cancellation

    Full text link
    The Short-Time Target Cancellation (STTC) algorithm, developed as part of this dissertation research, is a “Cocktail Party Problem” processor that can boost speech intelligibility for a target talker from a specified “look” direction, while suppressing the intelligibility of competing talkers. The algorithm holds promise for both automatic speech recognition and assistive listening device applications. The STTC algorithm operates on a frame-by-frame basis, leverages the computational efficiency of the Fast Fourier Transform (FFT), and is designed to run in real time. Notably, performance in objective measures of speech intelligibility and sound source segregation is comparable to that of the Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM). Because the STTC algorithm computes a time-frequency mask that can be applied independently to both the left and right signals, binaural cues for spatial hearing, including Interaural Time Differences (ITDs), Interaural Level Differences (ILDs) and spectral cues, can be preserved in potential hearing aid applications. A minimalist design for a proposed STTC Assistive Listening Device (ALD), consisting of six microphones embedded in the frame of a pair of eyeglasses, is presented and evaluated using virtual room acoustics and both objective and behavioral measures. The results suggest that the proposed STTC ALD can provide a significant speech intelligibility benefit in complex auditory scenes comprised of multiple spatially separated talkers.2020-10-22T00:00:00
    corecore