77 research outputs found

    Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement

    Full text link
    Deep learning algorithm are increasingly used for speech enhancement (SE). In supervised methods, global and local information is required for accurate spectral mapping. A key restriction is often poor capture of key contextual information. To leverage long-term for target speakers and compensate distortions of cleaned speech, this paper adopts a sequence-to-sequence (S2S) mapping structure and proposes a novel monaural speech enhancement system, consisting of a Feature Extraction Block (FEB), a Compensation Enhancement Block (ComEB) and a Mask Block (MB). In the FEB a U-net block is used to extract abstract features using complex-valued spectra with one path to suppress the background noise in the magnitude domain using masking methods and the MB takes magnitude features from the FEBand compensates the lost complex-domain features produced from ComEB to restore the final cleaned speech. Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of ESTOI and PESQ scores.Comment: 5 pages, 6 figures, references adde

    Relation between acoustic measurements and the perceived diffuseness of a synthesised sound field

    No full text
    This paper describes an investigation of different objective metrics for predicting the perceived diffuseness of reproduced sound and why common metrics, such as Interaural Cross-Correlation Coefficient (IACC), sound pressure level uniformity, and the diffuseness calculation used in Directional Audio Coding (DirAC), may be less appropriate for analysing the perceived diffuseness of a reproduced field than they are for architectural acoustics applications. A listening test was conducted to elicit the perceived diffuseness of sound fields of uncorrelated pink noise signals replayed over 19 different loudspeaker arrangements. Listeners rated how diffuse they perceived each stimulus. A range of different measurements of the sound field were then compared to the subjective test results. The data show that objective metrics do not always correlate well with the perceived diffuseness, especially for specific loudspeaker arrangements. Possible explanations of these results are discussed

    Enhancement of forward suppression begins in the ventral cochlear nucleus.

    Get PDF
    A neuron׳s response to a sound can be suppressed by the presentation of a preceding sound. It has been suggested that this suppression is a direct correlate of the psychophysical phenomenon of forward masking, however, forward suppression, as measured in the responses of the auditory nerve, was insufficient to account for behavioural performance. In contrast the neural suppression seen in the inferior colliculus and auditory cortex was much closer to psychophysical performance. In anaesthetised guinea-pigs, using a physiological two-interval forced-choice threshold tracking algorithm to estimate suppressed (masked) thresholds, we examine whether the enhancement of suppression can occur at an earlier stage of the auditory pathway, the ventral cochlear nucleus (VCN). We also compare these responses with the responses from the central nucleus of the inferior colliculus (ICc) using the same preparation. In both nuclei, onset-type neurons showed the greatest amounts of suppression (16.9-33.5dB) and, in the VCN, these recovered with the fastest time constants (14.1-19.9ms). Neurons with sustained discharge demonstrated reduced masking (8.9-12.1dB) and recovery time constants of 27.2-55.6ms. In the VCN the decrease in growth of suppression with increasing suppressor level was largest for chopper units and smallest for onset-type units. The threshold elevations recorded for most unit types are insufficient to account for the magnitude of forward masking as measured behaviourally, however, onset responders, in both the cochlear nucleus and inferior colliculus demonstrate a wide dynamic range of suppression, similar to that observed in human psychophysics.This work was supported by Wellcome Trust and BBSRC Project Grants to IMW and first presented in preliminary form by Ingham et al. (2006b). We thank Elinor Gunning and Catherine Slattery for their help and input during pilot experiments and Mark Sayles for help in data collection in later experiments.This is the final version of the article. It first appeared from Elsevier via https://doi.org/10.1016/j.brainres.2016.02.04

    Holistische Signalverarbeitung in einem Modell latenzverknüpfter Neuronen

    No full text

    Speaker identification using auditory modelling and vector quantization

    No full text
    This paper presents an experimental evaluation of different features for use in speaker identification (SID). The features are tested using speech data provided by the EUROM1 database, in a text-independent closed-set speaker identification task. The main objective of the paper is to present a novel parameterization of speech that is based on an auditory model called Auditory Image Model (AIM). This model provides features of the speech signal and their utility is assessed in the context of speaker identification. In order to explore the features that are more informative for predicting a speaker’s identity, the auditory image is used within the framework of cutting it into rectangles. Then, a novel strategy is incorporated for the enrolment of speakers, which is used for specifying the regions of the image that contain features that make a speaker discriminative. Afterwards, the new speaker-specific feature representation is assessed in noisy conditions that simulate a real-world environment. Their performance is compared with the results obtained adopting MFCC features in the context of a Vector Quantization (VQ) classification system. The results for the identification accuracy suggest that the new parameterization provides better results compared to conventional MFCCs especially for low SNRs

    Sensitivity to envelope interaural time differences at high modulation rates

    No full text
    Sensitivity to interaural time differences (ITDs) conveyed in the temporal fine structure of low-frequency tones and the modulated envelopes of high-frequency sounds are considered comparable, particularly for envelopes shaped to transmit similar fidelity of temporal information normally present for low-frequency sounds. Nevertheless, discrimination performance for envelope modulation rates above a few hundred Hertz is reported to be poor—to the point of discrimination thresholds being unattainable—compared with the much higher (>1,000?Hz) limit for low-frequency ITD sensitivity, suggesting the presence of a low-pass filter in the envelope domain. Further, performance for identical modulation rates appears to decline with increasing carrier frequency, supporting the view that the low-pass characteristics observed for envelope ITD processing is carrier-frequency dependent. Here, we assessed listeners’ sensitivity to ITDs conveyed in pure tones and in the modulated envelopes of high-frequency tones. ITD discrimination for the modulated high-frequency tones was measured as a function of both modulation rate and carrier frequency. Some well-trained listeners appear able to discriminate ITDs extremely well, even at modulation rates well beyond 500?Hz, for 4-kHz carriers. For one listener, thresholds were even obtained for a modulation rate of 800?Hz. The highest modulation rate for which thresholds could be obtained declined with increasing carrier frequency for all listeners. At 10?kHz, the highest modulation rate at which thresholds could be obtained was 600?Hz. The upper limit of sensitivity to ITDs conveyed in the envelope of high-frequency modulated sounds appears to be higher than previously considered

    A study of the perception, level of satisfaction and control requirements of self-fitting hearing aid: (a qualitative study)

    No full text
    Background: Is a ‘science knows best’ approach the best option for hearing care, or do patients want more control; and if so, how much control do they want? The aim of this study is to assess what the thoughts and opinions of hearing aid users are towardsa hearing aid they can programme themselves and investigate what control they require. Methods: Semi-structured interviews were conducted with 11 hearing aid users (6 females and 5 males). Each participant was interviewed using a self-written 24-item questionnaire; validated using the content validity ratio method. Specially designed user interfaces(UI) to demonstrate how a SFHA might be controlled were shown to participants. Two versions were designed, an A-B selection version and a fader controlled version. Results: 100% of participants exhibited a positive response to the SFHA concept. The fader software version was preferred by 100% of participants, with greater control ability being the primary reason. Using thematic analysis, four themes were identified; (1)perception and expectations of a SFHA; (2) using the software as a control mechanism; (3) this is how you can make the software better; and (4) the care of an audiologist vs. a SFHA. Conclusions: The want and need for control is apparent within the data, demonstrating that a ‘science knows best’ approach may not be working within audiology clinics. Hearing aids users want the additional control to give them a more natural sound to theirhearing aid and greater ownership of their hearing. There is some fear of making mistakes and becoming obsessed with finding the correct setting. However, with training and repetition, perceived self-efficacy is high.<br/
    • …
    corecore