18 research outputs found

    A novel neural feature for a text-dependent speaker identification system

    Get PDF
    A novel feature based on the simulated neural response of the auditory periphery was proposed in this study for a speaker identification system. A well-known computational model of the auditory-nerve (AN) fiber by Zilany and colleagues, which incorporates most of the stages and the relevant nonlinearities observed in the peripheral auditory system, was employed to simulate neural responses to speech signals from different speakers. Neurograms were constructed from responses of inner-hair-cell (IHC)-AN synapses with characteristic frequencies spanning the dynamic range of hearing. The synapse responses were subjected to an analytical function to incorporate the effects of absolute and relative refractory periods. The proposed IHC-AN neurogram feature was then used to train and test the text-dependent speaker identification system using standard classifiers. The performance of the proposed method was compared to the results from existing baseline methods for both quiet and noisy conditions. While the performance using the proposed feature was comparable to the results of existing methods in quiet environments, the neural feature exhibited a substantially better classification accuracy in noisy conditions, especially with white Gaussian and street noises. Also, the performance of the proposed system was relatively independent of various types of distortions in the acoustic signals and classifiers. The proposed feature can be employed to design a robust speech recognition system

    Reference-Free Assessment of Speech Intelligibility Using Bispectrum of an Auditory Neurogram.

    No full text
    Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants

    Normalized scores as a function of hearing loss for all phoneme groups from TIMIT database.

    No full text
    <p>Neurograms were constructed for the signals presented at 65 dB SPL, and the features (<i>H</i><sub><i>1</i></sub>, <i>H</i><sub><i>2</i></sub>, and <i>H</i><sub><i>3</i></sub>) were computed from the ENV and TFS bispectrum responses. (A, C, E): Normalized score using ENV responses. (B, D, F): Normalized score using TFS responses.</p

    Illustration of the effect of noise on the bispectrum.

    No full text
    <p>The bispectrum contour maps were estimated from the ENV neurogram responses (for a listener with normal hearing) to a typical word /use/: (A) in quiet, and (B) under speech-shaped noise at an SNR of -5 dB. The presentation level of the signal was at 65 dB SPL.</p

    Illustration of the effect of hearing loss on bispectrum.

    No full text
    <p>The bispectrum contour map was derived from the ENV responses for listeners with normal hearing (A), and profound hearing loss (B). The range of frequencies over which phase coupling was observed and also the maximum magnitude changed as a function of hearing loss.</p

    Non-redundant area of computation of the bispectrum for real signals.

    No full text
    <p>The features of the bispectrum are calculated within the triangular area Ω.</p

    Block diagram of the proposed method.

    No full text
    <p>The speech signal was applied as an input to the model of the auditory-nerve (AN) fibers, and responses of the normal or impaired auditory systems were simulated for a wide range of characteristic frequencies to construct neurograms. The neurogram responses were smoothed to reflect envelope information (ENV) or all information temporal fine structure (TFS), and the bispectrum was estimated from the ENV or TFS neurogram. Finally the features were computed from the triangular area and normalized in order to estimate speech intelligibility score.</p

    Comparison of subjective scores to the predicted scores for a listener with mild to moderate hearing loss.

    No full text
    <p>Responses were simulated in quiet and under noisy conditions using the proposed method (ENV responses using <i>H</i><sub><i>3</i></sub> feature) and the full reference NSIM metric. The results are shown for four sound presentation levels (69, 79, 89, and 99) and four SNRs (-1, 5, 12, and in quiet condition). Each point represents the mean score for NU6 words at a particular sound presentation level. The linear regression coefficient between subjective and predicted scores for both the proposed metric and NSIM was found as ~0.92.</p

    Distribution of speech intelligibility scores.

    No full text
    <p>Scores were estimated using the proposed method for affricate phonemes with five types of hearing loss profiles. (A, B): Using ENV and TFS responses of three types (high, medium, and low spontaneous rates) of AN fibers; (C, D): Using ENV and TFS responses of only high SR AN fibers.</p
    corecore