Search CORE

10 research outputs found

Speaker-independent Speech Inversion for Estimation of Nasalance

Author: Boyce Suzanne
Espy-Wilson Carol
Oren Liran
Siriwardena Yashish M.
Tiede Mark K.
Publication venue
Publication date: 31/05/2023
Field of study

The velopharyngeal (VP) valve regulates the opening between the nasal and oral cavities. This valve opens and closes through a coordinated motion of the velum and pharyngeal walls. Nasalance is an objective measure derived from the oral and nasal acoustic signals that correlate with nasality. In this work, we evaluate the degree to which the nasalance measure reflects fine-grained patterns of VP movement by comparison with simultaneously collected direct measures of VP opening using high-speed nasopharyngoscopy (HSN). We show that nasalance is significantly correlated with the HSN signal, and that both match expected patterns of nasality. We then train a temporal convolution-based speech inversion system in a speaker-independent fashion to estimate VP movement for nasality, using nasalance as the ground truth. In further experiments, we also show the importance of incorporating source features (from glottal activity) to improve nasality prediction.Comment: Interspeech 202

arXiv.org e-Print Archive

Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise

Author: Jahromi Mohsen Zareian
Jensen Jesper
Østergaard Jan
Publication venue: 'International Speech Communication Association'
Publication date: 01/08/2017
Field of study

Crossref

VBN

Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation

Author: Chan Wai Yip Geoffrey
Edraki Amin
Fogerty Daniel
Jensen Jesper
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2019
Field of study

Crossref

VBN

A Physiologically Inspired Method for Audio Classification

Author: David V. Anderson
Kristopher Schlemmer
Sourabh Ravindran
Publication venue: Springer Nature
Publication date: 01/06/2005
Field of study

We explore the use of physiologically inspired auditory features with both physiologically motivated and statistical audio classification methods. We use features derived from a biophysically defensible model of the early auditory system for audio classification using a neural network classifier. We also use a Gaussian-mixture-model (GMM)-based classifier for the purpose of comparison and show that the neural-network-based approach works better. Further, we use features from a more advanced model of the auditory system and show that the features extracted from this model of the primary auditory cortex perform better than the features from the early auditory stage. The features give good classification performance with only one-second data segments used for training and testing

Springer - Publisher Connector

Directory of Open Access Journals

Spectrotemporal Modulation Sensitivity in Hearing-Impaired Listeners

Author: Mehraei Golbarg
Publication venue
Publication date: 01/01/2009
Field of study

Speech is characterized by temporal and spectral modulations. Hearing-impaired (HI) listeners may have reduced spectrotemporal modulation (STM) sensitivity, which could affect their speech understanding. This study examined effects of hearing loss and absolute frequency on STM sensitivity and their relationship to speech intelligibility, frequency selectivity and temporal fine-structure (TFS) sensitivity. Sensitivity to STM applied to four-octave or one-octave noise carriers were measured for normal-hearing and HI listeners as a function of spectral modulation, temporal modulation and absolute frequency. Across-frequency variation in STM sensitivity suggests that broadband measurements do not sufficiently characterize performance. Results were simulated with a cortical STM-sensitivity model. No correlation was found between the reduced frequency selectivity required in the model to explain the HI STM data and more direct notched-noise estimates. Correlations between low-frequency and broadband STM performance, speech intelligibility and frequency-modulation sensitivity suggest that speech and STM processing may depend on the ability to use TFS

Digital Repository at the University of Maryland

Author response image 2.

Author: Aertsen
Alho
Benjamini
Boatman
Bouchard
Brown
Chang
Chang
Chevillet
Cogan
Crone
di Pellegrino
Du
Edwards
Edwards
Formisano
Gallese
Garofolo
Guenther
Hartigan
Henschke
Hickok
Hickok
Houde
Hubert
Kalogeratos
Klein
Ladefoged
Liberman
Liberman
Lindblom
Lotto
Mesgarani
Nelson
Ojemann
Penfield
Pulvermüller
Pulvermüller
Rand
Rauschecker
Ray
Rizzolatti
Schneider
Scott
Steinschneider
Steinschneider
Theunissen
Tkach
Wang
Wild
Wilson
Woolley
Zatorre
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date
Field of study

Crossref

Auditory Streaming: Behavior, Physiology, and Modeling

Author: Ma Ling
Publication venue
Publication date: 01/01/2011
Field of study

Auditory streaming is a fundamental aspect of auditory perception. It refers to the ability to parse mixed acoustic events into meaningful streams where each stream is assumed to originate from a separate source. Despite wide interest and increasing scientific investigations over the last decade, the neural mechanisms underlying streaming still remain largely unknown. A simple example of this mystery concerns the streaming of simple tone sequences, and the general assumption that separation along the tonotopic axis is sufficient for stream segregation. However, this dissertation research casts doubt on the validity of this assumption. First, behavioral measures of auditory streaming in ferrets prove that they can be used as an animal model to study auditory streaming. Second, responses from neurons in the primary auditory cortex (A1) of ferrets show that spectral components that are well-separated in frequency produce comparably segregated responses along the tonotopic axis, no matter whether presented synchronously or consecutively, despite the substantial differences in their streaming percepts when measured psychoacoustically in humans. These results argue against the notion that tonotopic separation per se is a sufficient neural correlate of stream segregation. Thirdly, comparing responses during behavior to those during the passive condition, the temporal correlations of spiking activity between neurons belonging to the same stream display an increased correlation, while responses among neurons belonging to different streams become less correlated. Rapid task-related plasticity of neural receptive fields shows a pattern that is consistent with the changes in correlation. Taken together these results indicate that temporal coherence is a plausible neural correlate of auditory streaming. Finally, inspired by the above biological findings, we propose a computational model of auditory scene analysis, which uses temporal coherence as the primary criterion for predicting stream formation. The promising results of this dissertation research significantly advance our understanding of auditory streaming and perception

Digital Repository at the University of Maryland

Sound Object Recognition

Author: Patil Kailash
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 03/10/2018
Field of study

Humans are constantly exposed to a variety of acoustic stimuli ranging from music and speech to more complex acoustic scenes like a noisy marketplace. The human auditory perception mechanism is able to analyze these different kinds of sounds and extract meaningful information suggesting that the same processing mechanism is capable of representing different sound classes. In this thesis, we test this hypothesis by proposing a high dimensional sound object representation framework, that captures the various modulations of sound by performing a multi-resolution mapping. We then show that this model is able to capture a wide variety of sound classes (speech, music, soundscapes) by applying it to the tasks of speech recognition, speaker verification, musical instrument recognition and acoustic soundscape recognition. We propose a multi-resolution analysis approach that captures the detailed variations in the spectral characterists as a basis for recognizing sound objects. We then show how such a system can be fine tuned to capture both the message information (speech content) and the messenger information (speaker identity). This system is shown to outperform state-of-art system for noise robustness at both automatic speech recognition and speaker verification tasks. The proposed analysis scheme with the included ability to analyze temporal modulations was used to capture musical sound objects. We showed that using a model of cortical processing, we were able to accurately replicate the human perceptual similarity judgments and also were able to get a good classification performance on a large set of musical instruments. We also show that neither just the spectral feature or the marginals of the proposed model are sufficient to capture human perception. Moreover, we were able to extend this model to continuous musical recordings by proposing a new method to extract notes from the recordings. Complex acoustic scenes like a sports stadium have multiple sources producing sounds at the same time. We show that the proposed representation scheme can not only capture these complex acoustic scenes, but provides a flexible mechanism to adapt to target sources of interest. The human auditory perception system is known to be a complex system where there are both bottom-up analysis pathways and top-down feedback mechanisms. The top-down feedback enhances the output of the bottom-up system to better realize the target sounds. In this thesis we propose an implementation of top-down attention module which is complimentary to the high dimensional acoustic feature extraction mechanism. This attention module is a distributed system operating at multiple stages of representation, effectively acting as a retuning mechanism, that adapts the same system to different tasks. We showed that such an adaptation mechanism is able to tremendously improve the performance of the system at detecting the target source in the presence of various distracting background sources

JScholarship

Computational modeling of speech intelligibility in adverse conditions

Author: Chabot-Leclerc Alexandre
Publication venue: DTU Elektro
Publication date: 01/01/2016
Field of study

Online Research Database In Technology

The application of auditory signal processing principles to the detection, tracking and association of tonal components in sonar.

Author: Mill Robert William
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/2008
Field of study

A steady signal exerts two complementary effects on a noisy acoustic environment: one is to add energy, the other is to create order. The ear has evolved mechanisms to detect both effects and encodes the fine temporal detail of a stimulus in sequences of auditory nerve discharges. Taking inspiration from these ideas, this thesis investigates the use of regular timing for sonar signal detection. Algorithms that operate on the temporal structure of a received signal are developed for the detection of merchant vessels. These ideas are explored by reappraising three areas traditionally associated with power-based detection. First of all, a time-frequency display based on timing instead of power is developed. Rather than inquiring of the display, "How much energy has been measured at this frequency? ", one would ask, "How structured is the signal at this frequency? Is this consistent with a target? " The auditory-motivated zero crossings with peak amplitudes (ZCPA) algorithm forms the starting-point for this study. Next, matters related to quantitative system performance analysis are addressed, such as how often a system will fail to detect a signal in particular conditions, or how much energy is required to guarantee a certain probability of detection. A suite of optimal temporal receivers is designed and is subsequently evaluated using the same kinds of synthetic signal used to assess power-based systems: Gaussian processes and sinusoids. The final area of work considers how discrete components on a sonar signal display, such as tonals and transients, can be identified and organised according to auditory scene analysis principles. Two algorithms are presented and evaluated using synthetic signals: one is designed to track a tonal through transient events, and the other attempts to identify groups of comodulated tonals against a noise background. A demonstration of each algorithm is provided for recorded sonar signals

White Rose E-theses Online

OpenGrey Repository