108 research outputs found

    A novel neural feature for a text-dependent speaker identification system

    Get PDF
    A novel feature based on the simulated neural response of the auditory periphery was proposed in this study for a speaker identification system. A well-known computational model of the auditory-nerve (AN) fiber by Zilany and colleagues, which incorporates most of the stages and the relevant nonlinearities observed in the peripheral auditory system, was employed to simulate neural responses to speech signals from different speakers. Neurograms were constructed from responses of inner-hair-cell (IHC)-AN synapses with characteristic frequencies spanning the dynamic range of hearing. The synapse responses were subjected to an analytical function to incorporate the effects of absolute and relative refractory periods. The proposed IHC-AN neurogram feature was then used to train and test the text-dependent speaker identification system using standard classifiers. The performance of the proposed method was compared to the results from existing baseline methods for both quiet and noisy conditions. While the performance using the proposed feature was comparable to the results of existing methods in quiet environments, the neural feature exhibited a substantially better classification accuracy in noisy conditions, especially with white Gaussian and street noises. Also, the performance of the proposed system was relatively independent of various types of distortions in the acoustic signals and classifiers. The proposed feature can be employed to design a robust speech recognition system

    Phase effects on the masking of speech by harmonic complexes: Variations with level

    Get PDF
    Speech reception thresholds were obtained in normally hearing listeners for sentence targets masked by harmonic complexes constructed with different phase relationships. Maskers had either a constant fundamental frequency (F0), or had F0 changing over time, following a pitch contour extracted from natural speech. The median F0 of the target speech was very similar to that of the maskers. In experiment 1 differences in the masking produced by Schroeder positive and Schroeder negative phase complexes were small (around 1.5 dB) for moderate levels [60 dB sound pressure level (SPL)], but increased to around 6 dB for maskers at 80 dB SPL. Phase effects were typically around 1.5 dB larger for maskers that had naturally varying F0 contours than for maskers with constant F0. Experiment 2 showed that shaping the long-term spectrum of the maskers to match the target speech had no effect. Experiment 3 included additional phase relationships at moderate levels and found no effect of phase. Therefore, the phase relationship within harmonic complexes appears to have only minor effects on masking effectiveness, at least at moderate levels, and when targets and maskers are in the same F0 range

    Universality in Systems with Power-Law Memory and Fractional Dynamics

    Full text link
    There are a few different ways to extend regular nonlinear dynamical systems by introducing power-law memory or considering fractional differential/difference equations instead of integer ones. This extension allows the introduction of families of nonlinear dynamical systems converging to regular systems in the case of an integer power-law memory or an integer order of derivatives/differences. The examples considered in this review include the logistic family of maps (converging in the case of the first order difference to the regular logistic map), the universal family of maps, and the standard family of maps (the latter two converging, in the case of the second difference, to the regular universal and standard maps). Correspondingly, the phenomenon of transition to chaos through a period doubling cascade of bifurcations in regular nonlinear systems, known as "universality", can be extended to fractional maps, which are maps with power-/asymptotically power-law memory. The new features of universality, including cascades of bifurcations on single trajectories, which appear in fractional (with memory) nonlinear dynamical systems are the main subject of this review.Comment: 23 pages 7 Figures, to appear Oct 28 201

    The effect of hearing augmentation on cognitive assessment scores: a pilot crossover randomized controlled trial

    Get PDF
    This randomized cross-over pilot study aimed to evaluate the effect of hearing augmentation on cognitive assessment scores and duration to complete cognitive assessment among the elderly in-patients in a teaching hospital. A hearing amplifier was used for hearing augmentation and the Montreal Cognitive Assessment (MoCA) test was used to assess cognition. Seventy one patients were allocated into Group A (n=33) or Group B (n=38) using block randomization. There was no significant difference in total MoCA scores with and without hearing augmentation (p = 0.622). There was a significant improvement in the total scores on the second test that suggests a learning effect (p < 0.05). There was also no significant difference in time taken to complete cognitive assessment with and without hearing augmentation (p = 0.879). Similar statistical tests performed on a subgroup of patients with hearing impairment did not reveal significant results. The results of this study will now inform a larger randomized controlled study evaluating the use of hearing amplifiers as cost-effective solutions to hearing impairment in our older population

    Large-scale analysis of frequency modulation in birdsong data bases

    Get PDF
    DS & MP are supported by an EPSRC Leadership Fellowship EP/G007144/1. Our thanks to Alan McElligott for helpful advice while preparing the manuscript; Sašo Muševič for discussion and for making his DDM software available; and Rémi Gribonval and team at INRIA Rennes for discussion and software development during a research visit

    Spike-Timing-Based Computation in Sound Localization

    Get PDF
    Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal

    Predictions of Speech Intelligibility with a Model of the Normal and Impaired Auditory-periphery

    No full text
    A fall-off in speech intelligibility at higher-than-normal presentation levels has been observed for listeners with and without hearing loss. Speech intelligibility predictors based on the acoustic signal properties, such as th

    NSQM: A non-intrusive assessment of speech quality using normalized energies of the neurogram

    No full text
    This study proposes a new non-intrusive measure of speech quality, the neurogram speech quality measure (NSQM), based on the responses of a biologically-inspired computational model of the auditory system for listeners with normal hearing. The model simulates the responses of an auditory-nerve fiber with a characteristic frequency to a speech signal, and the population response of the model is represented by a neurogram (2D time-frequency representation). The responses of each characteristic frequency in the neurogram were decomposed into sub-bands using 1D discrete Wavelet transform. The normalized energy corresponding to each sub-band was used as an input to a support vector regression model to predict the quality score of the processed speech. The performance of the proposed non-intrusive measure was compared to the results from a range of intrusive and non-intrusive measures using three standard databases: the EXP1 and EXP3 of supplement 23 to the P series (P.Supp23) of ITU-T Recommendations and the NOIZEUS databases. The proposed NSQM achieved an overall better result over most of the existing metrics for the effects of compression codecs, additive and channel noises. © 201
    corecore