142 research outputs found
Model-based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids
Speech intelligibility is often severely degraded among hearing impaired
individuals in situations such as the cocktail party scenario. The performance
of the current hearing aid technology has been observed to be limited in these
scenarios. In this paper, we propose a binaural speech enhancement framework
that takes into consideration the speech production model. The enhancement
framework proposed here is based on the Kalman filter that allows us to take
the speech production dynamics into account during the enhancement process. The
usage of a Kalman filter requires the estimation of clean speech and noise
short term predictor (STP) parameters, and the clean speech pitch parameters.
In this work, a binaural codebook-based method is proposed for estimating the
STP parameters, and a directional pitch estimator based on the harmonic model
and maximum likelihood principle is used to estimate the pitch parameters. The
proposed method for estimating the STP and pitch parameters jointly uses the
information from left and right ears, leading to a more robust estimation of
the filter parameters. Objective measures such as PESQ and STOI have been used
to evaluate the enhancement framework in different acoustic scenarios
representative of the cocktail party scenario. We have also conducted
subjective listening tests on a set of nine normal hearing subjects, to
evaluate the performance in terms of intelligibility and quality improvement.
The listening tests show that the proposed algorithm, even with access to only
a single channel noisy observation, significantly improves the overall speech
quality, and the speech intelligibility by up to 15%.Comment: after revisio
Deep Denoising for Hearing Aid Applications
Reduction of unwanted environmental noises is an important feature of today's
hearing aids (HA), which is why noise reduction is nowadays included in almost
every commercially available device. The majority of these algorithms, however,
is restricted to the reduction of stationary noises. In this work, we propose a
denoising approach based on a three hidden layer fully connected deep learning
network that aims to predict a Wiener filtering gain with an asymmetric input
context, enabling real-time applications with high constraints on signal delay.
The approach is employing a hearing instrument-grade filter bank and complies
with typical hearing aid demands, such as low latency and on-line processing.
It can further be well integrated with other algorithms in an existing HA
signal processing chain. We can show on a database of real world noise signals
that our algorithm is able to outperform a state of the art baseline approach,
both using objective metrics and subject tests.Comment: submitted to IWAENC 201
Improving the Speech Intelligibility By Cochlear Implant Users
In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec
We introduce M3-AUDIODEC, an innovative neural spatial audio codec designed
for efficient compression of multi-channel (binaural) speech in both single and
multi-speaker scenarios, while retaining the spatial location information of
each speaker. This model boasts versatility, allowing configuration and
training tailored to a predetermined set of multi-channel, multi-speaker, and
multi-spatial overlapping speech conditions. Key contributions are as follows:
1) Previous neural codecs are extended from single to multi-channel audios. 2)
The ability of our proposed model to compress and decode for overlapping
speech. 3) A groundbreaking architecture that compresses speech content and
spatial cues separately, ensuring the preservation of each speaker's spatial
context after decoding. 4) M3-AUDIODEC's proficiency in reducing the bandwidth
for compressing two-channel speech by 48% when compared to individual binaural
channel compression. Impressively, at a 12.6 kbps operation, it outperforms
Opus at 24 kbps and AUDIODEC at 24 kbps by 37% and 52%, respectively. In our
assessment, we employed speech enhancement and room acoustic metrics to
ascertain the accuracy of clean speech and spatial cue estimates from
M3-AUDIODEC. Audio demonstrations and source code are available online at
https://github.com/anton-jeran/MULTI-AUDIODEC .Comment: More results and source code are available at
https://anton-jeran.github.io/MAD
- …